Advanced NCBI

Post on 12-Jul-2015

2.484 views 7 download

Tags:

Transcript of Advanced NCBI

Advanced NCBI.The Entrez API

https://github.com/lindenb/courses

Pierre Lindenbaum@yokofakun

pierre.lindenbaum@univ-nantes.frhttp://plindenbaum.blogspot.com

Institut du Thorax. Nantes. France

September 27, 2016

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

NCBI ? What about EBI, ENSEMBL, ...

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

What will be covered today? :

File formats...

EInfo, GQuery, ESearch , Esummary, EFetch..

processing XML answer with XSLT: HTML, SVG, R...

generating a java parser for dbSNP.

NCBI EBot

using standalone BLAST

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

CURL

c u r l ” h t t p : / / en . w i k i p e d i a . o rg / w i k i / Main page ”wget −O − ” h t t p : / / en . w i k i p e d i a . o rg / w i k i / Main page ”

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

XML

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

XSLT

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

XSLT

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

XSLTPROC

x s l t p r o c s t y l e s h e e t . x s l f i l e . xml > r e s u l t . xml

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

JSON

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Formats

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

FormatsGenbank

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.

fcgi?db=nucleotide&id=25&rettype=gb

LOCUS X53813 422 bp DNA l i n e a r MAM 22−JUN−1992DEFINITION Blue Whale heavy s a t e l l i t e DNA.ACCESSION X53813 X17460VERSION X53813 . 1 GI : 25KEYWORDS s a t e l l i t e DNA.SOURCE Ba l a enop t e r a muscu lus ( Blue whale )

ORGANISM Ba la enop t e r a muscu lusEukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ; Eu t e l e o s t om i ;Mammalia ; Eu t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d a c t y l a ; Cetacea ;My s t i c e t i ; B a l a e nop t e r i d a e ; Ba l a enop t e r a .

REFERENCE 1 ( ba se s 1 to 422)AUTHORS Arnason ,U. and Widegren ,B .TITLE Compos i t ion and chromosomal l o c a l i z a t i o n o f c e t acean h i g h l y

r e p e t i t i v e DNA with s p e c i a l r e f e r e n c e to the b l u e whale ,Ba l a enop t e r a muscu lus

JOURNAL Chromosoma 98 (5 ) , 323−329 (1989)PUBMED 2612291

COMMENT See a l s o <X52700−2> f o r 1 ,760 bp common ce tacean component c l o n e sand <X52703−6>,<X53811−4> f o r the 422 bp heavy s a t e l l i t e c l o n e s .

FEATURES Loca t i on / Q u a l i f i e r ss ou r c e 1 . . 4 2 2

/ organ i sm=”Ba l a enop t e r a muscu lus ”/mo l type=”genomic DNA”/ db x r e f=”taxon :9771”/ c l o n e=”7”

m i s c f e a t u r e 1 . . 4 2 2/ note=”heavy s a t e l l i t e DNA”

ORIGIN1 t a g t t a t t c a a c c t a t c c c a c t c t c t a g a t a c c c c t t a g c acgtaaagga a t a t t a t t t g

61 ggggtccagc ca tggagaa t ag t t t a ga c a c tagga tgag ataaggaaca c a c c c a t t c t121 aaagaaatca c a t t a g g a t t c t c t t t t t a a g c t g t t c c t t aaaacac tag ag t c t t a gaa181 a t c t a t t g g a ggcagaagca gtcaagggta g c c t aggg t t agggt taggc t t a ggg t t a g241 gg t t aggg ta cggc t taggg t a c t g t t t c g gggaggggtt caggtacggc g taggg ta tg301 gg t t a ggg t t agggt taggg t t a g t g t t a g gg t t agggc t cgg t t t aggg t a cggg t t ag361 ga t t aggg ta cg tg t t aggg t t aggg tagg g c t t a g g g t t agggtacgtg t t a ggg t t a g421 gg

//

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

FormatsASN.1

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.

fcgi?db=nucleotide&id=25

Seq−e n t r y : := seq {i d {

embl {a c c e s s i o n ”X53813” ,v e r s i o n 1 } ,

g i 25 } ,d e s c r {

t i t l e ”Blue Whale heavy s a t e l l i t e DNA” ,s ou r c e {

org {taxname ” Ba l a enop t e r a muscu lus ” ,common ”Blue whale ” ,db {{

db ” taxon ” ,tag

i d 9771 } } ,orgname {

nameb i nom i a l {

genus ” Ba l a enop t e r a ” ,s p e c i e s ”muscu lus ” } ,

l i n e a g e ” Eukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ;Eu t e l e o s t om i ; Mammalia ; Eu t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d a c t y l a ; Cetacea ;My s t i c e t i ; B a l a e nop t e r i d a e ; Ba l a enop t e r a ” ,

gcode 1 ,mgcode 2 ,d i v ”MAM” } } ,

sub type {{

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

FormatsASN.1 (schema)

http:

//www.ncbi.nlm.nih.gov/data_specs/asn/insdseq.asn

INSDSeq : := SEQUENCE {l o c u s V i s i b l e S t r i n g ,l e n g t h INTEGER ,s t r a nd edn e s s V i s i b l e S t r i n g OPTIONAL ,moltype V i s i b l e S t r i n g ,t opo l ogy V i s i b l e S t r i n g OPTIONAL ,d i v i s i o n V i s i b l e S t r i n g ,update−date V i s i b l e S t r i n g ,c r e a t e−date V i s i b l e S t r i n g OPTIONAL ,update−r e l e a s e V i s i b l e S t r i n g OPTIONAL ,c r e a t e−r e l e a s e V i s i b l e S t r i n g OPTIONAL ,d e f i n i t i o n V i s i b l e S t r i n g ,pr imary−a c c e s s i o n V i s i b l e S t r i n g OPTIONAL ,ent ry−v e r s i o n V i s i b l e S t r i n g OPTIONAL ,a c c e s s i o n−v e r s i o n V i s i b l e S t r i n g OPTIONAL ,othe r−s e q i d s SEQUENCE OF INSDSeqid OPTIONAL ,secondary−a c c e s s i o n s SEQUENCE OF INSDSecondary−accn OPTIONAL,p r o j e c t V i s i b l e S t r i n g OPTIONAL ,keywords SEQUENCE OF INSDKeyword OPTIONAL ,segment V i s i b l e S t r i n g OPTIONAL ,s ou r c e V i s i b l e S t r i n g OPTIONAL ,organ i sm V i s i b l e S t r i n g OPTIONAL ,taxonomy V i s i b l e S t r i n g OPTIONAL ,r e f e r e n c e s SEQUENCE OF INSDReference OPTIONAL ,comment V i s i b l e S t r i n g OPTIONAL ,comment−s e t SEQUENCE OF INSDComment OPTIONAL ,s t r u c−comments SEQUENCE OF INSDStrucComment OPTIONAL ,p r imary V i s i b l e S t r i n g OPTIONAL ,source−db V i s i b l e S t r i n g OPTIONAL ,database−r e f e r e n c e V i s i b l e S t r i n g OPTIONAL ,f e a t u r e−t a b l e SEQUENCE OF INSDFeature OPTIONAL ,f e a t u r e−s e t SEQUENCE OF INSDFeatureSet OPTIONAL ,sequence V i s i b l e S t r i n g OPTIONAL , −− Opt i ona l f o r con t i g , wgs , e t c .c o n t i g V i s i b l e S t r i n g OPTIONAL ,a l t−seq SEQUENCE OF INSDAltSeqData OPTIONAL

}

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

FormatsASN.1 (tools)

DATATOOLGenerate C++ data storage classes based on ASN.1 serialization

streams.Convert data between ASN.1, XML and JSON formats.

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

FormatsXML

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.

fcgi?db=nucleotide&id=25&retmode=xml

<?xml v e r s i o n=” 1 .0 ”?><!DOCTYPE GBSet PUBLIC ”−//NCBI//NCBI GBSeq/EN” ” h t t p : //www. ncb i . nlm . n i h . gov/ dtd /NCBI GBSeq . dtd ”><GBSet>

<GBSeq><GBSeq locus>X53813</GBSeq locus><GBSeq length>422</GBSeq length><GBSeq st randedness>doub l e</GBSeq st randedness><GBSeq moltype>DNA</GBSeq moltype><GBSeq topology> l i n e a r</GBSeq topology><GBSeq d i v i s i o n>MAM</ GBSeq d i v i s i o n><GBSeq update−date>22−JUN−1992</GBSeq update−date><GBSeq create−date>13−JUL−1990</GBSeq create−date><GBSeq d e f i n i t i o n>Blue Whale heavy s a t e l l i t e DNA</ GBSeq d e f i n i t i o n><GBSeq primary−a c c e s s i o n>X53813</GBSeq primary−a c c e s s i o n><GBSeq access ion−v e r s i o n>X53813 . 1</GBSeq access ion−v e r s i o n><GBSeq other−s e q i d s>

<GBSeqid>emb |X53813 . 1 |</GBSeqid><GBSeqid>g i |25</GBSeqid>

</GBSeq other−s e q i d s><GBSeq secondary−a c c e s s i o n s>

<GBSecondary−accn>X17460</GBSecondary−accn></GBSeq secondary−a c c e s s i o n s><GBSeq keywords>

<GBKeyword> s a t e l l i t e DNA</GBKeyword></GBSeq keywords><GBSeq source>Ba laenop t e r a muscu lus ( Blue whale )</GBSeq source><GBSeq organism>Ba laenop t e r a muscu lus</GBSeq organism><GBSeq taxonomy>Eukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ; Eu t e l e o s t om i ; Mammalia ; Eu t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d

a c t y l a ; Cetacea ; My s t i c e t i ; B a l a e nop t e r i d a e ; Ba l a enop t e r a</GBSeq taxonomy><GBSeq r e f e r ence s>

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

FormatsXML (DTD)

http://www.ncbi.nlm.nih.gov/dtd/NCBI_GBSeq.mod.dtd

<!ELEMENT GBSeq (GBSeq locus ,GBSeq length ,GBSeq s t randedness ? ,GBSeq moltype ,GBSeq topology ? ,GBSeq d i v i s i on ,GBSeq update−date ,GBSeq create−date ? ,GBSeq update−r e l e a s e ? ,GBSeq create−r e l e a s e ? ,GBSeq de f i n i t i o n ,GBSeq primary−a c c e s s i o n ? ,GBSeq entry−v e r s i o n ? ,GBSeq access ion−v e r s i o n ? ,GBSeq other−s e q i d s ? ,GBSeq secondary−a c c e s s i o n s ? ,GBSeq pro j ec t ? ,GBSeq keywords ? ,GBSeq segment ? ,GBSeq source ? ,GBSeq organism ? ,GBSeq taxonomy ? ,GBSeq r e f e r ence s ? ,GBSeq comment ? ,GBSeq comment−s e t ? ,GBSeq struc−comments ? ,( . . . )

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

E-Utilities

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

GI

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

GI

http://www.ncbi.nlm.nih.gov/news/

03-02-2016-phase-out-of-GI-numbers/ : ”NCBI is phasingout sequence GIs - use Accession.Version instead!”

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

E-Utilities

Set of seven server-side programs that provide a stable interface tothe search, retrieval, and linking functions of the Entrez system,

using a fixed URL syntax.The output provided by the E-Utilities is in XML format,

sometimes JSON, (...)

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Entrez Direct

http://www.ncbi.nlm.nih.gov/books/NBK179288/ ”EntrezDirect (EDirect) is an advanced method for accessing the NCBI’sset of interconnected databases (publication, sequence, structure,gene, variation, expression, etc.) from a UNIX terminal window.

Functions take search terms from command-line arguments.Individual operations are combined to build multi-step queries.

Record retrieval and formatting normally complete the process.”

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EInfo

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EInfo

Provides a list of the names of all valid Entrez databases.Provides statistics for a single database, including lists of indexing

fields and available link names.

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EInfo

Base URL:https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EInfoXML Ouput

https:

//eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi

<e I n f o R e s u l t><D bLi s t>

<DbName>pubmed</DbName><DbName>p r o t e i n</DbName><DbName>n u c c o r e</DbName><DbName>n u c l e o t i d e</DbName><DbName>n u c g s s</DbName><DbName>n u c e s t</DbName><DbName>s t r u c t u r e</DbName><DbName>genome</DbName><DbName>a s s e m b l y</DbName><DbName>g c a s s e m b l y</DbName><DbName>genomepr j</DbName><DbName>b i o p r o j e c t</DbName><DbName>b i o s a m p l e</DbName><DbName>b i o s y s t e m s</DbName><DbName>b l a s t d b i n f o</DbName><DbName>books</DbName><DbName>cdd</DbName><DbName>c l i n v a r</DbName>

( . . . )

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EInfoJSON Ouput

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.

fcgi?retmode=json

{” h e a d e r ” : {

” t y p e ” : ” e i n f o ” ,” v e r s i o n ” : ” 0 . 3 ”

} ,” e i n f o r e s u l t ” : {

” d b l i s t ” : [”pubmed ” ,” p r o t e i n ” ,” n u c c o r e ” ,

( . . . )” u n i g e n e ” ,” g e n c o l l ” ,” g t r ”

]}

}Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EInfo

Return statistics for a given Entrez database:https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?

db=DbName

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EInfoStatistics for Pubmed

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.

fcgi?db=pubmed

<?xml v e r s i o n=” 1 .0 ”?><e I n f o R e s u l t>

<DbInfo><DbName>pubmed</DbName><MenuName>PubMed</MenuName><De s c r i p t i o n>PubMed b i b l i o g r a p h i c r e c o r d</ De s c r i p t i o n><DbBui ld>Bui ld130805−2117m.4</DbBui ld><Count>22974581</Count><LastUpdate>2013/08/06 08 :33</ LastUpdate><F i e l d L i s t>

( . . . )<F i e l d>

<Name>UID</Name><FullName>UID</FullName><De s c r i p t i o n>Unique number a s s i g n e d to p u b l i c a t i o n</ De s c r i p t i o n><TermCount>0</TermCount><I sDa t e>N</ I sDa t e><I sNume r i c a l>Y</ I sNume r i c a l><S ing l eToken>Y</ S ing l eToken><H i e r a r c h y>N</ H i e r a r c h y><I sH idden>Y</ I sH idden>

</ F i e l d><F i e l d>

( . . . )

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EInfoStatistics for Pubmed

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.

fcgi?db=pubmed&retmode=json

{” heade r ” : {

” type ” : ” e i n f o ” ,” v e r s i o n ” : ”0 .3”

} ,” e i n f o r e s u l t ” : {

” db i n f o ” : {”dbname ” : ”pubmed” ,”menuname ” : ”PubMed” ,” d e s c r i p t i o n ” : ”PubMed b i b l i o g r a p h i c r e c o r d ” ,” dbbu i l d ” : ”Bui ld160921−2207m.6” ,” count ” : ”26470199” ,” l a s t u p d a t e ” : ”2016/09/22 16 :32” ,” f i e l d l i s t ” : [

{”name ” : ”ALL” ,” f u l l n ame ” : ” A l l F i e l d s ” ,” d e s c r i p t i o n ” : ” A l l te rms from a l l s e a r c h a b l e f i e l d s ” ,” termcount ” : ”179424126” ,” i s d a t e ” : ”N” ,” i s n um e r i c a l ” : ”N” ,” s i n g l e t o k e n ” : ”N” ,” h i e r a r c h y ” : ”N” ,” i s h i d d e n ” : ”N”

} ,{

”name ” : ”UID” ,” f u l l n ame ” : ”UID” ,” d e s c r i p t i o n ” : ”Unique number a s s i g n e d to p u b l i c a t i o n ” ,

( . . . )Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EInfoWith entrez-direct

$ e i n f o −dbs$ e i n f o −db pubmed

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

GQuery

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

GQuery

Provides the number of records retrieved in all Entrez databases bya single text query.

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

GQueryExample

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ gquery ? term=ty r anno s au r u s%20r e x&retmode=xml”

<Re su l t><Term>t y r a nno s au r u s r e x</Term><eGQueryResu l t>

<Resu l t I t em><DbName>pubmed</DbName><MenuName/><Count>41</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>

<Resu l t I t em><DbName>pmc</DbName><MenuName/><Count>160</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>

<Resu l t I t em><DbName>mesh</DbName><MenuName/><Count>15</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>

<Resu l t I t em><DbName>books</DbName><MenuName/><Count>179</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>

<Resu l t I t em><DbName>pubmedhealth</DbName><MenuName/><Count>21</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>

<Resu l t I t em><DbName>omim</DbName><MenuName/><Count>10</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>

<Resu l t I t em><DbName>omia</DbName><MenuName/><Count>0</Count><Sta tu s>Termor Database i s not found</ S ta tu s></ Re su l t I t em>

<Resu l t I t em><DbName>n c b i s e a r c h</DbName><MenuName/><Count>1</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>

<Resu l t I t em><DbName>nucco re</DbName><MenuName/><Count>0</Count><Sta tu s>Term or Database i s not found</ S ta tu s></ Re su l t I t em>

( . . . )

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

GQueryTransforming to HTML using XSLT

The XSLT stylesheet. https://raw.githubusercontent.com/

lindenb/courses/master/about.ncbi/gquery2html.xsl

1 <?xml v e r s i o n=’ 1 .0 ’ encod ing=”UTF−8” ?>2 <x s l : s t y l e s h e e t xm l n s : x s l= ’ h t t p : //www.w3 . org /1999/XSL/Transform ’ v e r s i o n=’ 1 .0 ’>3 <x s l : o u t p u t method=”html ”/>45 <x s l : t em p l a t e match=”/”><html><body>6 <x s l : a p p l y−t emp l a t e s s e l e c t=” Re su l t ”/>7 </body></html></ x s l : t em p l a t e>89 <x s l : t em p l a t e match=” Re su l t ”>

10 <t a b l e><c ap t i o n><x s l : v a l u e−o f s e l e c t=”Term”/></ c ap t i o n>11 <t r><th>Database</ th><th>Count</ th><th>Sta tu s</ th></ t r>12 <x s l : a p p l y−t emp l a t e s s e l e c t=” eGQueryResu l t / Re su l t I t em ”/>13 </ t a b l e>14 </ x s l : t em p l a t e>1516 <x s l : t em p l a t e match=” Re su l t I t em ”>17 <t r>18 <td><a>19 <x s l : a t t r i b u t e name=” h r e f ”>h t t p : //www. ncb i . nlm . n i h . gov/<x s l : v a l u e−o f s e l e c t=”

DbName”/>?cmd=sea r ch&amp ; term=<x s l : v a l u e−o f s e l e c t=” t r a n s l a t e (/ Re s u l t /Term, ’ ’ , ’+ ’ ) ”/></ x s l : a t t r i b u t e>

20 <x s l : v a l u e−o f s e l e c t=”DbName”/></a></ td>21 <td><x s l : v a l u e−o f s e l e c t=”Count”/></ td>22 <td><x s l : v a l u e−o f s e l e c t=” Sta tu s ”/></ td>23 </ t r>24 </ x s l : t em p l a t e>2526 </ x s l : s t y l e s h e e t>

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

GQueryTransforming to HTML

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ gquery ? term=ty r anno s au r u s%20r e x&retmode=xml” |\

x s l t p r o c gquery2html . x s l −

<html><body>

<t a b l e><capt i on>t y r a nno s au r u s r e x</ capt i on><t r>

<th>Database</ th><th>Count</ th><th>Sta tu s</ th>

</ t r><t r>

<td><a h r e f=” h t t p s : //www. ncb i . nlm . n i h . gov/pubmed?cmd=sea r ch&amp ; term=ty r anno s au r u s+r e x ”>pubmed</a>

</ td><td>41</ td><td>Ok</ td>

</ t r><t r>

<td><a h r e f=” h t t p s : //www. ncb i . nlm . n i h . gov/pmc?cmd=sea r c h&amp ; term=ty r anno s au r u s+r e x ”>pmc</a>

</ td><td>160</ td><td>Ok</ td>

</ t r><t r>

<td><a h r e f=” h t t p s : //www. ncb i . nlm . n i h . gov/mesh?cmd=sea r ch&amp ; term=ty r anno s au r u s+r e x ”>mesh</a>

</ td><td>15</ td>Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

ESearch

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

ESearch

Provides a list of UIDs matching a text query

Posts the results of a search on the History server

Downloads all UIDs from a dataset stored on the Historyserver

Combines or limits UID datasets stored on the History server

Sorts sets of UIDs

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

ESearchSyntax

Base URL https:

//eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

ESearchSearching for ’Mammuthus primigenius’

c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D” |\

xm l l i n t −−fo rmat −

<eSea r c hRe su l t><Count>684</Count><RetMax>20</RetMax><Re tS t a r t>0</ Re tS t a r t><I d L i s t>

<I d>507866428</ Id><I d>124056416</ Id><I d>383843869</ Id><I d>383843867</ Id><I d>383843865</ Id><I d>383843863</ Id><I d>383843861</ Id><I d>383843859</ Id><I d>383843857</ Id><I d>383843855</ Id><I d>383843853</ Id><I d>383843851</ Id><I d>383843849</ Id><I d>383843847</ Id><I d>383843845</ Id><I d>157367690</ Id><I d>157367676</ Id><I d>157367662</ Id><I d>157367648</ Id><I d>157367634</ Id>

</ I d L i s t><Tr a n s l a t i o n S e t>

<T r a n s l a t i o n><From>”Mammuthus p r im i g e n i u s ” [ORGN]</From><To>”Mammuthus p r im i g e n i u s ” [ Organism ]</To>

</ T r a n s l a t i o n></ T r a n s l a t i o n S e t><Tran s l a t i o nS t a c k>

<TermSet><Term>”Mammuthus p r im i g e n i u s ” [ Organism ]</Term><F i e l d>Organism</ F i e l d><Count>684</Count><Exp lode>Y</ Exp lode>

</TermSet><OP>GROUP</OP>

</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>”Mammuthus p r im i g e n i u s ” [ Organism ]</ Que r yT ran s l a t i o n>

</ eSea r chRe su l t>

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

ESearchSearching for ’Mammuthus primigenius’ (JSON)

c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&retmode=j s o n ”

{” heade r ” : {

” type ” : ” e s e a r c h ” ,” v e r s i o n ” : ”0 .3”

} ,” e s e a r c h r e s u l t ” : {

” count ” : ”811” ,” retmax ” : ”20” ,” r e t s t a r t ” : ”0” ,” i d l i s t ” : [

”1059791223” ,”198241525” ,”198241523” ,”198241521” ,”198241519” ,”198241517” ,”198241515” ,”198241513” ,”198241511” ,”198241509” ,”198241507” ,”198241505” ,”198241503” ,”198241501” ,”198241499” ,”198241497” ,”198241495” ,”198241493” ,”198241491” ,”198241489”

] ,” t r a n s l a t i o n s e t ” : [

{” from ” : ”\”Mammuthus p r im i g e n i u s \”[ORGN]” ,” to ” : ”\”Mammuthus p r im i g e n i u s \”[ Organism ]”

}] ,” t r a n s l a t i o n s t a c k ” : [

{” term ” : ”\”Mammuthus p r im i g e n i u s \”[ Organism ] ” ,” f i e l d ” : ”Organism ” ,” count ” : ”811” ,” exp l ode ” : ”Y”

} ,”GROUP”

] ,” q u e r y t r a n s l a t i o n ” : ”\”Mammuthus p r im i g e n i u s \”[ Organism ]”

}}

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

ESearchthe retmax parameter

c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&retmax=2” |\

xm l l i n t −−fo rmat −

<eSea r c hRe su l t><Count>684</Count><RetMax>2</RetMax><Re tS t a r t>0</ Re tS t a r t><I d L i s t>

<I d>507866428</ Id><I d>124056416</ Id>

</ I d L i s t><Tr a n s l a t i o n S e t>

<T r a n s l a t i o n><From>”Mammuthus p r im i g e n i u s ” [ORGN]</From><To>”Mammuthus p r im i g e n i u s ” [ Organism ]</To>

</ T r a n s l a t i o n></ T r a n s l a t i o n S e t><Tran s l a t i o nS t a c k>

<TermSet><Term>”Mammuthus p r im i g e n i u s ” [ Organism ]</Term><F i e l d>Organism</ F i e l d><Count>684</Count><Exp lode>Y</ Exp lode>

</TermSet><OP>GROUP</OP>

</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>”Mammuthus p r im i g e n i u s ” [ Organism ]</ Que r yT ran s l a t i o n>

</ eSea r chRe su l t>

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

ESearchthe retstart parameter

c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&retmax=3&r e t s t a r t =100” |\

xm l l i n t −−fo rmat −

<eSea r c hRe su l t><Count>684</Count><RetMax>3</RetMax><Re tS t a r t>100</ Re tS t a r t><I d L i s t>

<I d>300810656</ Id><I d>300810655</ Id><I d>300810654</ Id>

</ I d L i s t><Tr a n s l a t i o n S e t>

<T r a n s l a t i o n><From>”Mammuthus p r im i g e n i u s ” [ORGN]</From><To>”Mammuthus p r im i g e n i u s ” [ Organism ]</To>

</ T r a n s l a t i o n></ T r a n s l a t i o n S e t><Tran s l a t i o nS t a c k>

<TermSet><Term>”Mammuthus p r im i g e n i u s ” [ Organism ]</Term><F i e l d>Organism</ F i e l d><Count>684</Count><Exp lode>Y</ Exp lode>

</TermSet><OP>GROUP</OP>

</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>”Mammuthus p r im i g e n i u s ” [ Organism ]</ Que r yT ran s l a t i o n>

</ eSea r chRe su l t>

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

ESearchrettype=retcount

c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&r e t t y p e=count ” |\

xm l l i n t −−fo rmat −

<e S e a r c h R e s u l t><Count>684</ Count>

</ e S e a r c h R e s u l t>

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

ESearchsort=Date Released

c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&s o r t=Date+Re l ea s ed ”

xm l l i n t −−fo rmat −

<e S e a r c h R e s u l t><Count>811</ Count><RetMax>20</RetMax><R e t S t a r t>0</ R e t S t a r t>< I d L i s t><I d>1033204644</ I d><I d>1033204658</ I d><I d>1033204672</ I d><I d>1033204686</ I d><I d>1033204729</ I d><I d>1033204771</ I d><I d>1033204785</ I d><I d>1033204799</ I d><I d>1033204813</ I d><I d>1033204827</ I d><I d>1033204871</ I d><I d>1033205124</ I d><I d>1033205194</ I d><I d>1033205208</ I d><I d>1033205222</ I d><I d>1033205236</ I d><I d>1033205264</ I d><I d>1033205390</ I d>( . . . )

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

ESummary

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

ESummarySyntax

Returns document summaries (DocSums) for a list of inputUIDs

Returns DocSums for a set of UIDs stored on the EntrezHistory server

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

ESummarySyntax

Base URL: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=(DB)&id=(TERM)

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

ESummaryRetrieve nucleotide gi=507866428

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=nu c l e o t i d e&i d =507866428”

<eSummaryResult><DocSum><I d>507866428</ Id><I tem Name=”Capt ion ” Type=” S t r i n g ”>KC524742</ Item><I tem Name=” T i t l e ” Type=” S t r i n g ”>Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene , p a r t i a l cds</ Item><I tem Name=”Ext ra ” Type=” S t r i n g ”>g i |507866428 | gb |KC524742 . 1 | [ 5 0 7866428 ]</ Item><I tem Name=”Gi ” Type=” I n t e g e r ”>507866428</ Item><I tem Name=”CreateDate ” Type=” S t r i n g ”>2013/06/15</ Item><I tem Name=”UpdateDate” Type=” S t r i n g ”>2013/06/21</ Item><I tem Name=” F l ag s ” Type=” I n t e g e r ”>0</ Item><I tem Name=”TaxId ” Type=” I n t e g e r ”>37349</ Item><I tem Name=”Length ” Type=” I n t e g e r ”>9042</ Item><I tem Name=” Sta tu s ” Type=” S t r i n g ”> l i v e</ Item><I tem Name=”ReplacedBy ” Type=” S t r i n g ”></ Item><I tem Name=”Comment” Type=” S t r i n g ”><! [CDATA[ ] ]></ Item></DocSum></ eSummaryResult>

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

ESummaryRetrieve nucleotide gi=507866428 in JSON

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=nu c l e o t i d e&i d =507866428& retmode=j s o n ”

{” heade r ” : {

” type ” : ”esummary ” ,” v e r s i o n ” : ”0 .3”

} ,” r e s u l t ” : {

” u i d s ” : [”507866428”

] ,”507866428”: {

” u id ” : ”507866428” ,” c ap t i o n ” : ”KC524742 ” ,” t i t l e ” : ”Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene , p a r t i a l cds ” ,” e x t r a ” : ” g i |507866428 | gb |KC524742 . 1 | ” ,” g i ” : 507866428 ,” c r e a t e d a t e ” : ”2013/06/15” ,” updatedate ” : ”2013/06/21” ,” f l a g s ” : ”” ,” t a x i d ” : 37349 ,” s l e n ” : 9042 ,” b iomol ” : ” genomic ” ,”moltype ” : ”dna ” ,” t opo l ogy ” : ” l i n e a r ” ,” sou rcedb ” : ” i n s d ” ,” s e g s e t s i z e ” : ”” ,” p r o j e c t i d ” : ”0” ,

( . . . )

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

ESummaryRetrieve snp rs25

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=snp&i d=25”

<eSummaryResult><DocSum><I d>25</ Id><I tem Name=”SNP ID” Type=” I n t e g e r ”>25</ Item><I tem Name=”Organism” Type=” S t r i n g ”></ Item><I tem Name=”ALLELE ORIGIN” Type=” S t r i n g ”></ Item><I tem Name=”GLOBAL MAF” Type=” S t r i n g ”>0 .4913</ Item><I tem Name=”GLOBAL POPULATION” Type=” S t r i n g ”></ Item><I tem Name=”GLOBAL SAMPLESIZE” Type=” I n t e g e r ”>0</ Item><I tem Name=”SUSPECTED” Type=” S t r i n g ”></ Item><I tem Name=”CLINICAL SIGNIFICANCE” Type=” S t r i n g ”></ Item><I tem Name=”GENE” Type=” S t r i n g ”>THSD7A</ Item><I tem Name=”LOCUS ID” Type=” I n t e g e r ”>221981</ Item><I tem Name=”ACC” Type=” S t r i n g ”>NM 015204 . 2 , NT 007819 .17</ Item><I tem Name=”CHR” Type=” S t r i n g ”>7</ Item><I tem Name=”WEIGHT” Type=” I n t e g e r ”>1</ Item><I tem Name=”HANDLE” Type=” S t r i n g ”>1000GENOMES, BGI , BL ,BUSHMAN,COMPLETE GENOMICS,CSHL−HAPMAP,GMI , ILLUMINA−UK,KWOK,PERLEGEN,SSMP,TISHKOFF</ Item><I tem Name=”FXN CLASS” Type=” S t r i n g ”>i n t r on−v a r i a n t</ Item><I tem Name=”VALIDATED” Type=” S t r i n g ”>by−1000G, by−c l u s t e r , by−f r equency , by−hapmap</ Item><I tem Name=”GTYPE” Type=” S t r i n g ”>t r u e</ Item><I tem Name=”NONREF” Type=” S t r i n g ”>f a l s e</ Item><I tem Name=”DOCSUM” Type=” S t r i n g ”>HGVS=NC 000007 .13 : g .11584142T&gt ;C , NG 027670 . 1 : g .292683A&gt ;G, NM 015204 . 2 : c .1454−1398A&gt ;G, NT 007819 .17 : g .11574142T&gt ;C|SEQ=TCTGTGAGCTTCTGCATGCAATCCT[A/G]TGCAATTGGAATTTGATAGTCCTTT|GENE=THSD7A:221981</ Item><I tem Name=”HET” Type=” I n t e g e r ”>50</ Item><I tem Name=”SRATE” Type=” I n t e g e r ”>0</ Item><I tem Name=”TAX ID” Type=” I n t e g e r ”>9606</ Item><I tem Name=”CHRRPT” Type=” S t r i n g ”>2 5 | 2 | 0 | 1 | 1 | 1 | 7 | NT 007819 .17 |11574141 |11584142 |THSD7A|0 . 499848 |0 . 00872267 | | 51 |1 | 1 |36 | 13 8 | 0 | | | T:2178 :0 .4913</ Item><I tem Name=”ORIG BUILD” Type=” I n t e g e r ”>36</ Item><I tem Name=”UPD BUILD” Type=” I n t e g e r ”>138</ Item><I tem Name=”CREATEDATE” Type=” S t r i n g ”>2000−09−19 17 :02</ Item><I tem Name=”UPDATEDATE” Type=” S t r i n g ”>2013−06−21 14 :17</ Item><I tem Name=”POP CLASS” Type=” S t r i n g ”></ Item><I tem Name=”METHOD CLASS” Type=” S t r i n g ”>computed , h y b r i d i z e , sequence , unknown</ Item><I tem Name=”SNP3D” Type=” S t r i n g ”></ Item><I tem Name=”LINKOUT” Type=” S t r i n g ”>ILLUMINA−UK| h t t p : //www. i l l um i n a . com/HumanGenomeNA18507 000019106 NCBI36 . 1 ch r7 11550667</ Item><I tem Name=”SS” Type=” I n t e g e r ”>654151077</ Item><I tem Name=”LOCSNPID” Type=” S t r i n g ”>7 11584142</ Item><I tem Name=”ALLELE” Type=” S t r i n g ”>R</ Item><I tem Name=”SNP CLASS” Type=” S t r i n g ”>snp</ Item><I tem Name=”CHRPOS” Type=” S t r i n g ”>7 :11584142</ Item><I tem Name=”CONTIGPOS” Type=” S t r i n g ”>NT 007819 .17 :11574142</ Item><I tem Name=”TEXT” Type=” S t r i n g ”></ Item><I tem Name=”LOOKUP” Type=” S t r i n g ”>325952</ Item></DocSum></ eSummaryResult>

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

ESummaryRetrieve pubmed pmid=7939126

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=pubmed&i d =7939126”

<eSummaryResult><DocSum><I d>7939126</ Id><I tem Name=”PubDate” Type=”Date”>1994 Apr</ Item><I tem Name=”EPubDate” Type=”Date”></ Item><I tem Name=”Source ” Type=” S t r i n g ”>S l e ep</ Item><I tem Name=” Au tho rL i s t ” Type=” L i s t ”><I tem Name=”Author ” Type=” S t r i n g ”>Broughton R</ Item><I tem Name=”Author ” Type=” S t r i n g ”>B i l l i n g s R</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Ca r tw r i gh t R</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Doucette D</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Edmeads J</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Edwardh M</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Er v i n F</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Orchard B</ Item><I tem Name=”Author ” Type=” S t r i n g ”>H i l l R</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Tu r r e l l G</ Item></ Item><I tem Name=” LastAuthor ” Type=” S t r i n g ”>Tu r r e l l G</ Item><I tem Name=” T i t l e ” Type=” S t r i n g ”>Homic ida l somnambul ism: a ca se r e p o r t .</ Item><I tem Name=”Volume” Type=” S t r i n g ”>17</ Item><I tem Name=” I s s u e ” Type=” S t r i n g ”>3</ Item><I tem Name=”Pages ” Type=” S t r i n g ”>253−64</ Item><I tem Name=” LangL i s t ” Type=” L i s t ”><I tem Name=”Lang” Type=” S t r i n g ”>Eng l i s h</ Item></ Item><I tem Name=”NlmUniqueID” Type=” S t r i n g ”>7809084</ Item><I tem Name=”ISSN” Type=” S t r i n g ”>0161−8105</ Item><I tem Name=”ESSN” Type=” S t r i n g ”>1550−9109</ Item><I tem Name=”PubTypeList ” Type=” L i s t ”><I tem Name=”PubType” Type=” S t r i n g ”>Jou r na l A r t i c l e</ Item></ Item><I tem Name=”Reco rdSta tus ” Type=” S t r i n g ”>PubMed − i ndexed f o r MEDLINE</ Item><I tem Name=”PubStatus ” Type=” S t r i n g ”>ppub l i s h</ Item><I tem Name=” A r t i c l e I d s ” Type=” L i s t ”><I tem Name=”pubmed” Type=” S t r i n g ”>7939126</ Item><I tem Name=” e i d ” Type=” S t r i n g ”>7939126</ Item><I tem Name=” r i d ” Type=” S t r i n g ”>7939126</ Item></ Item><I tem Name=” H i s t o r y ” Type=” L i s t ”><I tem Name=”pubmed” Type=”Date”>1994/04/01 00 :00</ Item><I tem Name=”med l i ne ” Type=”Date”>1994/04/01 00 :01</ Item><I tem Name=” en t r e z ” Type=”Date”>1994/04/01 00 :00</ Item></ Item><I tem Name=” Re f e r e n c e s ” Type=” L i s t ”></ Item><I tem Name=”HasAbst rac t ” Type=” I n t e g e r ”>1</ Item><I tem Name=”PmcRefCount” Type=” I n t e g e r ”>4</ Item><I tem Name=”Fu l l Journa lName ” Type=” S t r i n g ”>S l e ep</ Item><I tem Name=”ELocat ion ID ” Type=” S t r i n g ”></ Item><I tem Name=”SO” Type=” S t r i n g ”>1994 Apr ; 1 7 ( 3 ) :253−64</ Item></DocSum></ eSummaryResult>

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EFetch

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EFetchSyntax

Base URL: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=(db)&id=(ID)

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EFetchRetrieve nucleotide gi=507866428 as ASN.1

Default https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=507866428

Seq−e n t r y : := s e t {c l a s s nuc−p ro t ,d e s c r {

source {genome genomic ,org {

taxname ”Mammuthus p r im i g e n i u s ” ,common ” woo l l y mammoth” ,db {{

db ” taxon ” ,tag

i d 37349 } } ,orgname {

nameb i nom i a l {

genus ”Mammuthus” ,s p e c i e s ” p r im i g e n i u s ” } ,

mod {{

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EFetchRetrieve nucleotide gi=507866428 as Fasta

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.

fcgi?db=nucleotide&id=507866428&rettype=fasta

>g i |507866428 | gb |KC524742 . 1 | Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in(Mb) gene , p a r t i a l cds

GCACTTGCTTTTTTTGTCTTCTTCAGACCACGACATGGGACTCAGCGACGGGGAATGGGAGTTGGTGTTGAAAACCTGGGGGAAAGTGGAGGCTGACATCCCGGGCCATGGGCTGGAAGTCTTCGTCAGGTAAAGGAAGAAATCCTGTGGCCCCCATCACCCACCCCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EFetchRetrieve nucleotide gi=507866428 as TinySeq

https:

//eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?

db=nucleotide&id=507866428&rettype=fasta&retmode=xml

<?xml v e r s i o n=” 1 .0 ”?><!DOCTYPE TSeqSet PUBLIC ”−//NCBI//NCBI TSeq/EN”<TSeqSet>

<TSeq><TSeq seqtype v a l u e=” n u c l e o t i d e ”/><TSeq g i>507866428</TSeq g i><TSeq accver>KC524742 . 1</TSeq accver><TSeq tax id>37349</TSeq tax id><TSeq orgname>Mammuthus p r im i g e n i u s</TSeq orgnam<TSeq d e f l i n e>Mammuthus p r im i g e n i u s i s o l a t e CME2<TSeq length>9042</TSeq length><TSeq sequence>GCACTTGCTTTTTTTGTCTTCTTCAGACCACGA

</TSeq></TSeqSet>

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EFetchRetrieve nucleotide gi=507866428 as Genbank-xml

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.

fcgi?db=nucleotide&id=507866428&retmode=xml

<GBSeq><GBSeq locus>KC524742</GBSeq locus><GBSeq length>9042</GBSeq length><GBSeq st randedness>doub l e</GBSeq st randedness><GBSeq moltype>DNA</GBSeq moltype><GBSeq topology> l i n e a r</GBSeq topology><GBSeq d i v i s i o n>MAM</ GBSeq d i v i s i o n><GBSeq update−date>21−JUN−2013</GBSeq update−date><GBSeq create−date>15−JUN−2013</GBSeq create−date><GBSeq d e f i n i t i o n>Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene , p a r t i a l cds</ GBSeq d e f i n i t i o n><GBSeq primary−a c c e s s i o n>KC524742</GBSeq primary−a c c e s s i o n><GBSeq access ion−v e r s i o n>KC524742 . 1</GBSeq access ion−v e r s i o n><GBSeq other−s e q i d s>

<GBSeqid>gb |KC524742 . 1 |</GBSeqid><GBSeqid>g i |507866428</GBSeqid>

</GBSeq other−s e q i d s><GBSeq source>Mammuthus p r im i g e n i u s ( woo l l y mammoth)</GBSeq source><GBSeq organism>Mammuthus p r im i g e n i u s</GBSeq organism>

( . . . )

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EFetchRetrieve nucleotide gi=507866428 as Genbank

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.

fcgi?db=nucleotide&id=507866428&rettype=gb

LOCUS KC524742 9042 bp DNA l i n e a r MAM 21−JUN−2013DEFINITION Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene ,

p a r t i a l cds .ACCESSION KC524742VERSION KC524742 . 1 GI :507866428KEYWORDS .SOURCE Mammuthus p r im i g e n i u s ( woo l l y mammoth)

ORGANISM Mammuthus p r im i g e n i u sEukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ; Eu t e l e o s t om i ;Mammalia ; Eu t h e r i a ; A f r o t h e r i a ; P robo s c i d ea ; E l e phan t i d a e ;Mammuthus .

REFERENCE 1 ( ba se s 1 to 9042)AUTHORS Mirceta , S . , S ignore ,A .V . , Burns , J .M. , Cos s i n s ,A .R . , Campbel l ,K. L .

and Berenbr ink ,M.TITLE Evo l u t i o n o f mammalian d i v i n g c a p a c i t y t r a c e d by myog lob in net

s u r f a c e cha rgeJOURNAL Sc i e n c e 340 (6138) , 1234192 (2013)PUBMED 23766330

REFERENCE 2 ( ba se s 1 to 9042)AUTHORS Signore ,A .V . , Campbel l ,K. L . and Poinar ,H.N.TITLE D i r e c t Submis s i onJOURNAL Submitted (09−JAN−2013) B i o l o g i c a l Sc i ence s , U n i v e r s i t y o f

Manitoba , 50 S i f t o n Road , Winnipeg , Manitoba R3T2N2 , CanadaCOMMENT ##Assembly−Data−START##

Sequenc ing Technology : : Sanger d i d eoxy s equenc i ng##Assembly−Data−END##

FEATURES Loca t i on / Q u a l i f i e r ssource 1 . . 9 042

/ organ i sm=”Mammuthus p r im i g e n i u s ”/mo l type=”genomic DNA”/ i s o l a t e=”CME2005/915”/ d b x r e f=” taxon :37349 ”/ t i s s u e t y p e=”bone”

gene <35..>9042/gene=”Mb”

mRNA j o i n ( <35 . .129 ,5627 . .5849 ,8979 . . >9042)/ gene=”Mb”/ product=”myog lob in ”

CDS j o i n (35 . . 129 , 5627 . . 5849 , 8979 . . >9042 )

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EFetchEfetch works with the ACCESSION NUMBERS

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.

fcgi?db=nucleotide&id=KC524742&rettype=gb

LOCUS KC524742 9042 bp DNA l i n e a r MAM 21−JUN−2013DEFINITION Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene ,

p a r t i a l cds .ACCESSION KC524742VERSION KC524742 . 1 GI :507866428KEYWORDS .SOURCE Mammuthus p r im i g e n i u s ( woo l l y mammoth)

ORGANISM Mammuthus p r im i g e n i u sEukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ; Eu t e l e o s t om i ;Mammalia ; Eu t h e r i a ; A f r o t h e r i a ; P robo s c i d ea ; E l e phan t i d a e ;Mammuthus .

REFERENCE 1 ( ba se s 1 to 9042)AUTHORS Mirceta , S . , S ignore ,A .V . , Burns , J .M. , Cos s i n s ,A .R . , Campbel l ,K. L .

and Berenbr ink ,M.TITLE Evo l u t i o n o f mammalian d i v i n g c a p a c i t y t r a c e d by myog lob in net

s u r f a c e cha rgeJOURNAL Sc i e n c e 340 (6138) , 1234192 (2013)PUBMED 23766330

REFERENCE 2 ( ba se s 1 to 9042)AUTHORS Signore ,A .V . , Campbel l ,K. L . and Poinar ,H.N.TITLE D i r e c t Submis s i onJOURNAL Submitted (09−JAN−2013) B i o l o g i c a l Sc i ence s , U n i v e r s i t y o f

Manitoba , 50 S i f t o n Road , Winnipeg , Manitoba R3T2N2 , CanadaCOMMENT ##Assembly−Data−START##

Sequenc ing Technology : : Sanger d i d eoxy s equenc i ng##Assembly−Data−END##

FEATURES Loca t i on / Q u a l i f i e r ssource 1 . . 9 042

/ organ i sm=”Mammuthus p r im i g e n i u s ”/mo l type=”genomic DNA”/ i s o l a t e=”CME2005/915”/ d b x r e f=” taxon :37349 ”/ t i s s u e t y p e=”bone”

gene <35..>9042/gene=”Mb”

mRNA j o i n ( <35 . .129 ,5627 . .5849 ,8979 . . >9042)/ gene=”Mb”/ product=”myog lob in ”

CDS j o i n (35 . . 129 , 5627 . . 5849 , 8979 . . >9042 )

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EFetchUsing the WebEnv parameter.

Web environment string returned from a previous ESearch, EPostor ELink call. When provided, ESearch will post the results of thesearch operation to this pre-existing WebEnv.

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EFetchUsing the WebEnv parameter.

Searching extinct species in the NCBI taxonomy (’extinct[PROP]’)c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?usehistory=y&db=

taxonomy&term=e x t i n c t%5BPROP%5D”

<eSea r c hRe su l t><Count>145</Count><RetMax>20</RetMax><Re tS t a r t>0</ Re tS t a r t><QueryKey>1</QueryKey><WebEnv>NCID 1 75550312 130.14.18.34 9001 1375948145 325582538</WebEnv><I d L i s t>

<I d>1225531</ Id><I d>1225530</ Id><I d>1211276</ Id><I d>1211275</ Id><I d>1027716</ Id><I d>948961</ Id><I d>943952</ Id><I d>867394</ Id><I d>867393</ Id><I d>748142</ Id><I d>748141</ Id><I d>741158</ Id><I d>703576</ Id><I d>703571</ Id><I d>703559</ Id><I d>693865</ Id><I d>686441</ Id><I d>665113</ Id><I d>659069</ Id><I d>656807</ Id>

</ I d L i s t><Tr a n s l a t i o n S e t /><Tran s l a t i o nS t a c k>

<TermSet><Term>e x t i n c t [PROP]</Term><F i e l d>PROP</ F i e l d><Count>145</Count><Exp lode>N</ Exp lode>

</TermSet><OP>GROUP</OP>

</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>e x t i n c t [PROP]</ Que r yT ran s l a t i o n>

</ eSea r chRe su l t>

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EFetchUsing the WebEnv parameter.

Fetch the extinct species in the NCBI taxonomy (’extinct[PROP]’)using the WebEnv parameter.

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=taxonomy&query key=1&WebEnv=NCID 1 75550312 130.14.18.34 9001 1375948145 325582538&retmode=xml”

<TaxaSet><Taxon><TaxId>1225531</TaxId><Sc i e n t i f i cName>Equus ovodov i</ S c i e n t i f i cName><OtherNames>

<Synonym>Equus ( Sussemionus ) ovodov i</Synonym><Name>

<ClassCDE>a u t h o r i t y</ClassCDE><DispName>Equus ovodov i Eisenmann &amp ; Se rge j , 2011</DispName>

</Name></OtherNames><ParentTax Id>1225530</ParentTax Id><Rank>s p e c i e s</Rank><D i v i s i o n>Mammals</ D i v i s i o n><Genet icCode>

<GCId>1</GCId><GCName>Standard</GCName>

</Genet icCode><MitoGenet icCode>( . . . . )

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EPOST

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EPost

Uploads a list of UIDs to the Entrez History server

Appends a list of UIDs to an existing set of UID lists attachedto a Web Environment

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EPostPost gi to epost

Get a list of gis of extincts animals:

wget −O − ’ h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=taxonomy&term=e x t i n c t [PROP]& retmax=1000 ’ |\

xm l l i n t −fo rmat − |\grep ’< Id>’ |\cut −d ’< ’ −f 2 |\cut −d ’> ’ −f 2|\t r ”\n” ” , ”

output:

1860150 ,1860149 ,1849957 ,1825730 ,1825729 ,1636722 ,1607772 ,1607771 ,1607767 ,1607757 ,1607756 ,1597978 ,1582057 ,1566623 ,1563127 ,1563126 ,1563125 ,1563124 ,1563123 ,1563122 ,1563121 ,1563120 ,1560315 ,1560314 ,1543223 ,1542494 ,1542469 ,1530197 ,1524889 ,1523245 ,1513476 ,1513474 ,1503129 ,1453604 ,1425170 ,1415635 ,1295174 ,1225531 ,1225530 ,1211276 ,1211275 ,1027716 ,948961 ,943952 ,867394 ,867393 ,748142 ,748141 ,741158 ,703576 ,703571 ,703559 ,693865 ,686441 ,665113 ,659069 ,656807 ,647691 ,647690 ,643746 ,643745 ,643744 ,643742 ,577682 ,572106 ,572105 ,572104 ,572099 ,572098 ,570943 ,570942 ,570941 ,551196 ,544298 ,523825 ,523824 ,523822 ,523821 ,523820 ,518692 ,518691 ,518689 ,475185 ,436495 ,436494 ,436493 ,436488 ,402889 ,399386 ,399178 ,386524 ,379504 ,363580 ,363579 ,363578 ,363571 ,339614 ,339612 ,339609 ,330944 ,330640 ,330639 ,330638 ,330637 ,330636 ,328612 ,314500 ,307641 ,304335 ,272462 ,268291 ,251263 ,251094 ,251093 ,239970 ,239969 ,237965 ,230980 ,230979 ,227166 ,227165 ,223567 ,222863 ,222862 ,216182 ,216181 ,201717 ,201716 ,192211 ,188536 ,187135 ,187134 ,187133 ,187132 ,187131 ,187118 ,184920 ,180214 ,180178 ,180177 ,180176 ,180175 ,180174 ,173935 ,166505 ,148923 ,147494 ,147466 ,147464 ,136416 ,136415 ,126594 ,126429 ,115942 ,107030 ,103864 ,94623 ,92649 ,92648 ,89252 ,89250 ,63631 ,63221 ,54568 ,54500 ,54497 ,54366 ,54365 ,48784 ,46906 ,39097 ,39053 ,39051 ,37349 ,37348 ,37185 ,27445 ,27444 ,20678 ,13266 ,13140 ,9619 ,9275 ,9274 ,9273 ,8818 ,8817 ,8815 ,8813 ,8812 ,8811 ,8810 ,8367 ,3409

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EPostPost gi to epost

wget −O − ’ h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / epo s t . f c g i ?db=taxonomy&WebEnd=NCID 1 15435144 130 . 1 4 . 2 2 . 2 1 59001 1474637318 669113391 0MetA0 S MegaStore F 1&i d=1860150 ,1860149 ,1849957 ,1825730 ,1825729 ,1636722 ,1607772. . . ”

Output:

<?xml v e r s i o n=” 1 .0 ”?><!DOCTYPE ePo s tRe su l t PUBLIC ”−//NLM//DTD ePos tResu l t , 11 May 2002//EN” ” h t t p : //

www. ncb i . nlm . n i h . gov/ e n t r e z / query /DTD/ ePost 020511 . dtd ”><ePo s tRe su l t><QueryKey>1</QueryKey><WebEnv>NCID 1 15467192 130 . 1 4 . 2 2 . 2 1 5

9001 1474637456 570452194 0MetA0 S MegaStore F 1</WebEnv></ ePo s tRe su l t>

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EPostSearching in the WebEnv

Search Homo Sapiens in WebEnv ?

c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=taxonomy&term=Homo%20Sapiens&u s e h i s t o r y=y&WebEnv=NCID 1 75550312 130 . 1 4 . 1 8 . 3 49001 1375948145 325582538&que r y k ey=1”

<eSea r c hRe su l t><Count>0</Count><RetMax>0</RetMax><Re tS t a r t>0</ Re tS t a r t><QueryKey>8</QueryKey><WebEnv>NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538</WebEnv><I d L i s t /><Tr a n s l a t i o n S e t /><Tran s l a t i o nS t a c k>

<OP>GROUP</OP><TermSet>

<Term>homo s a p i e n s [ A l l Names ]</Term><F i e l d>A l l Names</ F i e l d><Count>0</Count><Exp lode>N</ Exp lode>

</TermSet><OP>AND</OP>

</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>(#2) AND homo s a p i e n s [ A l l Names ]</ Que r yT ran s l a t i o n>

</ eSea r chRe su l t>

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EPostSearching in the WebEnv

Search Tyranosaurus in WebEnv ?

$ c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=taxonomy&term=Tyrannosaurus&u s e h i s t o r y=y&WebEnv=NCID 1 75550312 130 . 1 4 . 1 8 . 3 49001 1375948145 325582538&que r y k ey=1”

<eSea r c hRe su l t><Count>1</Count><RetMax>1</RetMax><Re tS t a r t>0</ Re tS t a r t><QueryKey>9</QueryKey><WebEnv>NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538</WebEnv><I d L i s t>

<I d>436494</ Id></ I d L i s t><Tr a n s l a t i o n S e t /><Tran s l a t i o nS t a c k>

<OP>GROUP</OP><TermSet>

<Term>Tyrannosaurus [ A l l Names ]</Term><F i e l d>A l l Names</ F i e l d><Count>1</Count><Exp lode>N</ Exp lode>

</TermSet><OP>AND</OP>

</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>(#2) AND Tyrannosaurus [ A l l Names ]</ Que r yT ran s l a t i o n>

</ eSea r chRe su l t>

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EDirect: combining tools

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Piping Edirect

e s e a r c h −db taxonomy −q u e r y ” T y r a n n o s a u r u s ” | \e f e t c h −fo rmat xml

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Piping Edirect

e s e a r c h −db pubmed −q u e r y ” T y r a n n o s a u r u s ” | \e f i l t e r −mindate 2005 | \e f e t c h −fo rmat docsum | \x t r a c t −p a t t e r n DocumentSummary \−e l em en t M e d l i n e C i t a t i o n /PMID \−e l em en t I d S o r t F i r s t A u t h o r

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Elink

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Elink

Returns UIDs linked to an input set of UIDs in either thesame or a different Entrez database

Returns UIDs linked to other UIDs in the same Entrezdatabase that match an Entrez query

Checks for the existence of Entrez links for a set of UIDswithin the same database

Lists the available links for a UID

Lists LinkOut URLs and attributes for a set of UIDs

Lists hyperlinks to primary LinkOut providers for a set of UIDs

Creates hyperlinks to the primary LinkOut provider for a singleUID

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Elink

Base URL:https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

ELinkSearching the pubmed records associated to sequence gi:507866428

h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e l i n k . f c g i ? dbfrom=nu c l e o t i d e&db=pubmed&i d =507866428&cmd=n e i g h b o r s c o r e

<eL i n kRe s u l t><L inkSe t>

<DbFrom>nucco re</DbFrom><I d L i s t>

<I d>507866428</ Id></ I d L i s t><LinkSetDb>

<DbTo>pubmed</DbTo><LinkName>nuccore pubmed</LinkName><L ink>

<I d>23766330</ Id><Score>0</ Score>

</ L ink></LinkSetDb>

</ L inkSe t></ eL i n kRe s u l t>

$ c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&i d =23766330& r e t t y p e=med l i ne&retmode=t e x t ”

PMID− 23766330TI − Evo l u t i o n o f mammalian d i v i n g c a p a c i t y t r a c e d by myog lob in net s u r f a c e

cha rge .PG − 1234192LID − 10.1126/ s c i e n c e .1234192 [ do i ]

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Transformations

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EfetchTransforming to SVG

Using the stylesheethttps://github.com/lindenb/xslt-sandbox/blob/master/

stylesheets/bio/ncbi/gb2svg.xsl

x s l t p r o c <( c u r l ” h t t p s : // raw . g i t hub . com/ l i n d e n b / x s l t−sandbox /master / s t y l e s h e e t s/ b i o / ncb i / gb2svg . x s l ” ) \

” h t t p s : //www. ncb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=nu c l e o t i d e&i d=14971102&retmode=xml&r e t t y p e=gbc”

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EfetchTransforming to SVG

1 <?xml v e r s i o n=” 1 .0 ” encod ing=”UTF−8”?>2 <s v g : s v g xm ln s : s v g=” h t t p : //www.w3 . org /2000/ svg ” h e i g h t=”121” width=”920” s t y l e=”

s t r oke−wid th : 1px ; ”>3 <s v g : t i t l e>Human r o t a v i r u s segment 7 NSP3 gene , complete cds</ s v g : t i t l e>4 <s v g : d e f s>5 <s v g : l i n e a r G r a d i e n t x1=”0%” y1=”0%” x2=”0%” y2=”100%” i d=” grad ”>6 <s v g : s t o p o f f s e t=”5%” stop−c o l o r=” b l a ck ”/>7 <s v g : s t o p o f f s e t=”50%” stop−c o l o r=”whitesmoke ”/>8 <s v g : s t o p o f f s e t=”95%” stop−c o l o r=” b l a ck ”/>9 </ s v g : l i n e a r G r a d i e n t>

10 <s v g : l i n e a r G r a d i e n t x1=”0%” y1=”0%” x2=”0%” y2=”100%” i d=”v e r t i c a l b o d y g r a d i e n t ”>

11 <s v g : s t o p o f f s e t=”5%” stop−c o l o r=”wh i t e ”/>12 <s v g : s t o p o f f s e t=”95%” stop−c o l o r=” l i g h t g r a y ”/>13 </ s v g : l i n e a r G r a d i e n t>14 </ s v g : d e f s>15 <s v g : s t y l e type=” t e x t / c s s ”/>16 <s v g : g>17 <s v g : g t r an s f o rm=” t r a n s l a t e (0 , 0 ) ”>18 <s v g : r e c t x=”0” y=”0” width=”920” h e i g h t=”120” f i l l =” u r l (#

v e r t i c a l b o d y g r a d i e n t ) ” s t r o k e=” b l a c k ”/>19 <s v g : t e x t s t y l e=” c o l o r : r e d ; font−s i z e : 3 5 p x ; ” x=”10” y=”35”>Human r o t a v i r u s

segment 7 NSP3 gene , complete cds</ s v g : t e x t>20 <s v g : g>21 <s v g : r e c t x=”10” y=”40” width=”900” h e i g h t=”18” s t y l e=” f i l l : u r l (#grad ) ;

s t r o k e : b l a c k ; ” t i t l e=” 1 . . 1 074 ”/>22 <s v g : t e x t y=”54” x=”460” tex t−anchor=”midd le ”><s v g : t s p a n s t y l e=” font−

we i g h t : b o l d ; ”>s ou r c e</ s v g : t s p a n><s v g : t s p a n xm l n s : x s i=” h t t p : //www.w3. org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=” h t t p : //www.w3 . org /1999/x l i n k ” font−we ight=” bo ld ”>organ i sm</ s v g : t s p a n>:Human r o t a v i r u s A <s v g : t s p a n xm l n s : x s i=” h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ”xm l n s : x l i n k=” h t t p : //www.w3 . org /1999/ x l i n k ” font−we ight=” bo ld ”>mol type</ s v g : t s p a n>: genomic RNA <s v g : t s p a n xm l n s : x s i=” h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=” h t t p : //www.w3 . org/1999/ x l i n k ” font−we ight=” bo ld ”>s t r a i n</ s v g : t s p a n>:M <s v g : t s p a nxm l n s : x s i=” h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=”h t t p : //www.w3 . org /1999/ x l i n k ” font−we ight=” bo ld ”>segment</ s v g : t s p a n>: 7 <s v g : t s p a n xm l n s : x s i=” h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e” xm l n s : x l i n k=” h t t p : //www.w3 . org /1999/ x l i n k ” font−we ight=” bo ld ”>c l o n e</ s v g : t s p a n>:M0</ s v g : t e x t>

23 </ s v g : g>24 <s v g : g>25 <s v g : r e c t x=”10” y=”60” width=”27.6794035414725 ” h e i g h t=”18” s t y l e=”

f i l l : u r l (#grad ) ; s t r o k e : b l a c k ; ” t i t l e=” 1 . . 3 4 ”/>26 <s v g : t e x t y=”74” x=”39.6794035414725 ” tex t−anchor=” s t a r t ”>27 <s v g : t s p a n s t y l e=” font−we i g h t : b o l d ; ”>5 ’UTR</s vg : t s pan>28 </s v g : t e x t>29 </svg :g>30 <svg :g>31 <s v g : r e c t x=”38.5181733457595” y=”80” width =”781.733457595526” h e i g h t

=”18” s t y l e=” f i l l : u r l (#grad ) ; s t r o k e : b l a c k ; ” t i t l e =”35..967”/>32 <s v g : t e x t y=”94” x=”429.384902143523” tex t−anchor=”midd le”><s v g : t s p a n

s t y l e=”font−we i g h t : b o l d ;”>CDS</s vg : t s pan><s v g : t s p a n xm l n s : x s i=”h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=”h t t p : //www.w3 . org /1999/ x l i n k ”font−we ight=”bo ld”>codon s t a r t</s vg : t s pan>: 1 <s v g : t s p a n xm l n s : x s i=”h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=”h t t p : //www.w3 . org /1999/x l i n k ” font−we ight=”bo ld”>product</s vg : t s pan>:NSP3 <s v g : t s p a n xm l n s : x s i=”h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=”h t t p : //www.w3 . org/1999/ x l i n k ” font−we ight=”bo ld”>p r o t e i n i d </s vg : t s pan>:AAK74116.1</ s v g : t e x t>

33 </svg :g>34 <svg :g>35 <s v g : r e c t x=”821.090400745573” y=”100” width =”88.909599254427” h e i g h t

=”18” s t y l e=” f i l l : u r l (#grad ) ; s t r o k e : b l a c k ; ” t i t l e =”968..1074”/>36 <s v g : t e x t y=”114” x=”819.090400745573” tex t−anchor=”end”>37 <s v g : t s p a n s t y l e=”font−we i g h t : b o l d ;”>3 ’UTR</ s v g : t s p a n>38 </ s v g : t e x t>39 </ s v g : g>40 <s v g : r e c t x=”0” y=”0” width=”920” h e i g h t=”120” f i l l =”none” s t r o k e=” b l a ck ”/

>41 </ s v g : g>42 </ s v g : g>43 </ s v g : s v g>

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EfetchTransforming to SVG

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EfetchTransforming to R

$ c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=pubmed&term=Tyrannosaurus&u s e h i s t o r y=t r u e ” | xm l l i n t −−fo rmat −

$ c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 59001 1375957034 1619786167&que r y k ey=1&retmode=xml”

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EfetchTransforming to R

1 <?xml v e r s i o n=’ 1 .0 ’ encod ing=”UTF−8” ?>2 <x s l : s t y l e s h e e t xm l n s : x s l= ’ h t t p : //www.w3 . org /1999/XSL/Transform ’ v e r s i o n=’ 1 .0 ’>3 <x s l : o u t p u t method=” t e x t ”/>456 <x s l : t em p l a t e match=”/”>7 date2count &l t ;− l i s t ( )8 <x s l : a p p l y−t emp l a t e s s e l e c t=”/PubmedArt i c l eSet / PubmedArt i c l e [ Med l i n eC i t a t i o n /

DateCreated /Year ] ”/>9 d f &l t ;− data . f rame (

10 Year=as . i n t e g e r ( names ( date2count ) ) ,11 Count=u n l i s t ( date2count )12 )13 png ( ’ j e te rpubmed . png ’ )14 p l o t ( d f )15 t i t l e ( ’ pubmed: count ( a r t i c l e s )=f ( y ea r ) ’ )16 dev . o f f ( )17 </ x s l : t em p l a t e>1819 <x s l : t em p l a t e match=”PubmedArt i c l e ”>20 <x s l : v a r i a b l e name=” yea r ” s e l e c t=”Med l i n eC i t a t i o n /DateCreated /Year ”/>21 date2count [ [ ”<x s l : v a l u e−o f s e l e c t=”$ yea r ”/>” ] ] & l t ;− i f e l s e ( i s . n u l l ( da te2count [ [

”<x s l : v a l u e−o f s e l e c t=”$ yea r ”/>” ] ] ) ,1 ,1+ date2count [ [ ”<x s l : v a l u e−o f s e l e c t=”$ yea r ”/>” ] ] )

22 </ x s l : t em p l a t e>2324 </ x s l : s t y l e s h e e t>

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EfetchTransforming to R

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 59001 1375957034 1619786167&que r y k ey=1&retmode=xml” |\

x s l t p r o c pubmed2rs ta t s . x s l −

date2count <− l i s t ( )

da te2count [ [ ”2013” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2013” ] ] ) ,1 ,1+ date2count [ [ ”2013” ] ] )

da te2count [ [ ”2012” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2012” ] ] ) ,1 ,1+ date2count [ [ ”2012” ] ] )

da te2count [ [ ”2012” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2012” ] ] ) ,1 ,1+ date2count [ [ ”2012” ] ] )

da te2count [ [ ”2011” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2011” ] ] ) ,1 ,1+ date2count [ [ ”2011” ] ] )

da te2count [ [ ”2011” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2011” ] ] ) ,1 ,1+ date2count [ [ ”2011” ] ] )

( . . )df <− data . frame (Year=as . i n t e g e r (names ( date2count ) ) ,Count=u n l i s t ( date2count ))png ( ’ j e te rpubmed . png ’ )p l o t ( df )t i t l e ( ’ pubmed : count ( a r t i c l e s )=f ( y ea r ) ’ )dev . o f f ( )

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

EfetchTransforming to R

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 59001 1375957034 1619786167&que r y k ey=1&retmode=xml” |\

x s l t p r o c pubmed2rs ta t s . x s l − |\R −−no−save

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Generating a JAVA parser

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Using the XML schemaXML Schema for dbSNP

ftp://ftp.ncbi.nlm.nih.gov/snp/specs/docsum_3.4.xsd

<?xml v e r s i o n=” 1 .0 ” encod ing=”UTF−8”?><xsd : schema xm ln s : x s d=” h t t p : //www.w3 . org /2001/XMLSchema” xmlns=” h t t p : //www. ncb i . nlm . n i h . gov/SNP/docsum” targetNamespace=” h t t p : //www. ncb i . nlm . n i h . gov/SNP/docsum” e lementFormDefault=” q u a l i f i e d ” a t t r i b u t eFo rmDe f a u l t=” u n q u a l i f i e d ”><x s d : e l emen t name=”ExchangeSet ”>

<x s d : a n n o t a t i o n><x sd :documenta t i on>Set o f dbSNP refSNP docsums , v e r s i o n 3 .4</ x sd :documenta t i on>

</ x s d : a n n o t a t i o n><xsd :complexType>

<x s d : s e qu en c e><x s d : e l emen t name=”SourceDatabase ” minOccurs=”0”>

<xsd :complexType><x s d : a t t r i b u t e name=” t a x I d ” type=” x s d : i n t ” use=” r e q u i r e d ”>

<x s d : a n n o t a t i o n><x sd :documenta t i on>NCBI taxonomy ID f o r v a r i a t i o n</ x sd :documenta t i on>

</ x s d : a n n o t a t i o n></ x s d : a t t r i b u t e><x s d : a t t r i b u t e name=” organ i sm ” type=” x s d : s t r i n g ” use=” r e q u i r e d ”>

<x s d : a n n o t a t i o n><x sd :documenta t i on>common name f o r s p e c i e s used as pa r t o f da tabase name .</ x sd :documenta t i on>

</ x s d : a n n o t a t i o n></ x s d : a t t r i b u t e><x s d : a t t r i b u t e name=”dbSnpOrgAbbr” type=” x s d : s t r i n g ”>

<x s d : a n n o t a t i o n><x sd :documenta t i on>organ i sm a b b r e v i a t i o n used i n dbSNP . </ x sd :documenta t i on>

</ x s d : a n n o t a t i o n></ x s d : a t t r i b u t e><x s d : a t t r i b u t e name=”gpipeOrgAbbr ” type=” x s d : s t r i n g ”>

<x s d : a n n o t a t i o n><x sd :documenta t i on>organ i sm a b b r e v i a t i o n used w i t h i n NCBI genome p i p e l i n e data dumps .</ x sd :documenta t i on>

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Using the XML schemaCompiling the XML Schema for dbSNP with XJC

$ x j c −d . ” f t p : // f t p . n cb i . nlm . n i h . gov/ snp/ spe c s /docsum 3 . 4 . xsd ”p a r s i n g a schema . . .c omp i l i n g a schema . . .h t t p s /www ncb i n lm n ih gov / snp/docsum/Assay . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Assembly . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/BaseURL . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Component . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/ExchangeSet . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/FxnSet . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/MapLoc . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Ob j e c tFac to r y . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Pr imarySequence . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Rs . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/RsL inkout . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/ RsSt ruc t . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Ss . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/package−i n f o . j a v a

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Using the XML schemaCompiling the XML Schema for dbSNP with XJC

Search the non-genomic rs# in dbSNP.1 import h t t p s . www ncb i n lm n ih gov . snp . docsum .∗ ;2 import j a v a x . xml . b ind .∗ ;3 import j a v a x . xml . s t ream .∗ ;4 import j a v a x . xml . s t ream . e v en t s .∗ ;5 c l a s s ParseDbSnp6 {7 pub l i c s t a t i c vo id main ( S t r i n g [ ] a r g s ) throws Excep t i on8 {9 JAXBContext j a xbC t x t=JAXBContext . new In s tance ( ” h t t p s . www ncb i n lm n ih gov

. snp . docsum” ) ;10 Unmar sha l l e r u nma r s h a l l e r=j a xbC t x t . c r e a t eUnma r s h a l l e r ( ) ;11 XMLInputFactory i f a c t o r y = XMLInputFactory . new Ins tance ( ) ;12 XMLEventReader r= i f a c t o r y . createXMLEventReader ( System . i n ) ;13 wh i l e ( r . hasNext ( ) )14 {15 XMLEvent ev t=r . peek ( ) ;16 i f ( ! ( e v t . i s S t a r t E l emen t ( ) && ev t . a sS t a r tE l emen t ( ) . getName ( ) .

g e t Lo c a lPa r t ( ) . e q u a l s ( ”Rs” ) ) )17 {18 ev t=r . nex tEvent ( ) ;19 cont inue ;20 }2122 Rs r s=unma r s h a l l e r . unmarsha l ( r , Rs . c l a s s ) . ge tVa lue ( ) ;23 i f ( ” genomic ” . e qua l s ( r s . getMolType ( ) ) ) cont inue ;24 System . out . p r i n t l n ( ” r s ”+r s . g e tRs I d ( )+” ”+r s . getMolType ( ) ) ;25 }26 r . c l o s e ( ) ;27 }28 }

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Using the XML schemaCompiling the XML Schema for dbSNP with XJC

compile...$ j a v a c ParseDbSnp . j a v a h t t p s /www ncb i n lm n ih gov / snp/docsum/∗ . j a v a

and run...$ c u r l −s ” f t p : // f t p . n cb i . n i h . gov/ snp/ o rgan i sms /human 9606/XML/ ds ch1 . xml . gz” |\gunz ip −c |\j a v a ParseDbSnp

r s701 cDNArs860 cDNArs861 cDNArs862 cDNArs863 cDNArs864 cDNArs865 cDNArs866 cDNArs877 cDNArs878 cDNArs879 cDNArs880 cDNArs882 cDNArs883 cDNArs884 cDNArs885 cDNArs886 cDNArs913 cDNArs945 cDNArs946 cDNA( . . . )

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

NCBI EBot

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

NCBI EBotURL

https://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/

ebot/ebot.cgi

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

NCBI EBotSample output

#!/ u s r / b i n / p e r l( . . . )# PUBLIC DOMAIN NOTICE# Nat i o na l Cente r f o r B i o t e chno l ogy I n f o rma t i o nuse LWP: : S imple ;use LWP: : UserAgent ;use Net : : FTP ;

my $de l a y = 0 ;my $maxdelay = 3 ;my $base = ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /” ;

$params{ ema i l} = ”nobody@nowhere . com” ;$params{db} = ” nuccore ” ;$params{ t o o l} = ” ebot ” ;$params{term} = ”Mammuthus+p r im i g e n i u s [ORGN] ” ;%params = e s e a r c h (%params ) ;

$params{retmode} = ”xml” ;$params{ o u t f i l e } = ” r e s u l t . xml” ;$params{ r e t t y p e} = ” na t i v e ” ;e f e t c h b a t c h (%params ) ;

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

BLAST

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Standalone BlastDownloading

Standalone tools are available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

#add BLAST to your pathexport PATH=${PATH} : / path / to / ncb i−b l a s t −2.2.28+/ b i n

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Standalone BlastDownload a sample

apis mellifera proteins

c u r l −o p r o t e i n . f a . gz \” f t p : // f t p . n cb i . n i h . gov/genomes/ A p i s m e l l i f e r a / p r o t e i n / p r o t e i n . f a . gz”

gunz ip p r o t e i n . f a . gz

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Standalone BlastCreate a Blast database with makeblastdb

Getting help...

$ makeb lastdb −h e l p( . . . )−dbtype <S t r i n g , ‘ nuc l ’ , ‘ p rot ’>

M o l e c u l e type o f t a r g e t db− i n <F i l e I n >

I n p u t f i l e / d a t a b a s e nameD e f a u l t = ‘− ’

− i n p u t t y p e <S t r i n g , ‘ a s n 1 b i n ’ , ‘ a s n 1 t x t ’ , ‘ b l a s t d b ’ , ‘ f a s t a ’>Type o f the data s p e c i f i e d i n i n p u t f i l eD e f a u l t = ‘ f a s t a ’

( . . )

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Standalone BlastCreate a Blast database with makeblastdb

Create the BLAST database:

$ makeb lastdb − i n p r o t e i n . f a −dbtype p r o t

B u i l d i n g a new DB, c u r r e n t t ime : 09/02/2013 1 8 : 2 9 : 3 8New DB name : p r o t e i n . f aNew DB t i t l e : p r o t e i n . f aSequence type : P r o t e i nKeep L i n k o u t s : TKeep MBits : TMaximum f i l e s i z e : 1000000000BAdding s e q u e n c e s from FASTA ; added 10570 s e q u e n c e s i n 1 .84458 s e c o n d s .

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Standalone BlastQuery a Blast database with blastp

Get help:

$ b l a s t p −h e l p( . . . )−q u e r y <F i l e I n >

I n p u t f i l e nameD e f a u l t = ‘− ’

−db <S t r i n g>BLAST d a t a b a s e name

( . . . )

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Standalone BlastBlast human EIF4G1 gi:187956781

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=p r o t e i n&r e t t y p e=f a s t a&i d =187956781” |\

b l a s t p −db p r o t e i n . f a

Query= g i |187956781 | gb |AAI40897 . 1 | EIF4G1 p r o t e i n [Homo s a p i e n s ]( . . . )

Score ESequences p roduc i ng s i g n i f i c a n t a l i g nmen t s : ( B i t s ) Value

g i |328782175 | r e f |XP 394628 . 4 | PREDICTED : e u k a r y o t i c t r a n s l a t i o n . . . 189 4e−49g i |328779480 | r e f | XP 003249661 . 1 | PREDICTED : h y p o t h e t i c a l p r o t e i . . . 38 .1 0 .017g i |110762568 | r e f | XP 001121713 . 1 | PREDICTED : h y p o t h e t i c a l p r o t e i . . . 38 .1 0 .018

( . . . )> g i |328782175 | r e f |XP 394628 . 4 | PREDICTED : e u k a r y o t i c t r a n s l a t i o ni n i t i a t i o n f a c t o r 4 gamma 2− l i k e [ Ap i s m e l l i f e r a ]Length=899

Score = 189 b i t s (479) , Expect = 4e−49, Method : Compos i t i ona l mat r i x a d j u s t .I d e n t i t i e s = 115/319 (36%) , P o s i t i v e s = 175/319 (55%) , Gaps = 39/319 (12%)

Query 717 KEPRKIIATVLMTEDIKLNKAEKAWKPSS−−KRTAADKDRGEEDADGSKTQDLFRRVRSI 774++P + +++ +DI+ E+ W P S +R A + S+ +FR+VR I

Sb j c t 22 RKPSETTVGLVIKDDIRSLSTEQRWIPPSTLRRDALTPE−−−−−−−−SRNNFIFRKVRGI 73

Query 775 LNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCL−−−−− 829LNKLTP+ F +L + + ++++ LKGVI LIFEKA+ EP +S YA +C+ L

Sb j c t 74 LNKLTPEKFAKLSNDLLNVELNSDVILKGVIFLIFEKALDEPKYSSMYAQLCKRLSDEAA 133

Query 830 −MALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLK 888K E F LLL++C+ EFE E FE + DE EE

Sb j c t 134 NFEPKKALIESQKGQSTFTFLLLSKCRDEFENRSKASEAFENQ−−−−DELGPEEE−−−−− 184Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Standalone BlastBlast human EIF4G1 gi:187956781 , ouput XML

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=p r o t e i n&r e t t y p e=f a s t a&i d =187956781” |\

b l a s t p −db p r o t e i n . f a −outfmt 5

( . . . )<H i t h s p s>

<Hsp><Hsp num>1</Hsp num><Hsp b i t−s c o r e>189.119</Hsp b i t−s c o r e><Hsp sco r e>479</ Hsp sco r e><Hsp eva l ue>3.78314 e−49</ Hsp eva l ue><Hsp query−from>717</Hsp query−from><Hsp query−to>1017</Hsp query−to><Hsp h i t−from>22</Hsp h i t−from><Hsp h i t−to>319</Hsp h i t−to><Hsp query−f rame>0</Hsp query−f rame><Hsp h i t−f rame>0</Hsp h i t−f rame><Hs p i d e n t i t y>115</ H s p i d e n t i t y><Hs p p o s i t i v e>175</ H s p p o s i t i v e><Hsp gaps>39</Hsp gaps><Hsp a l i gn−l e n>319</ Hsp a l i gn−l e n><Hsp qseq>KEPRKIIATVLMTEDIKLNKAEKAWKPSS−−KRTAADKDRGEEDADGSKTQDLFRRVRSILNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCL−−−−−−MALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEARD

IARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLL−−−−−−−−KNHDEESLECLCRLLTTIGKDLDFEKAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLRGSNWVPRRG−−DQGPKTIDQIHKEAE</Hsp qseq><Hsp hseq>RKPSETTVGLVIKDDIRSLSTEQRWIPPSTLRRDALTPE−−−−−−−−SRNNFIFRKVRGILNKLTPEKFAKLSNDLLNVELNSDVILKGVIFLIFEKALDEPKYSSMYAQLCKRLSDEAANFEPKKALIESQKGQSTFTFLLLSKCRDEFENRSKASEAFENQ−−−−DELGPEEE−−−−−−−−−ERRQ

VAKRKMLGNIKFIGELGKLGIVSETILHRCILQLLEKKRRRRSRGDTAEDIECLCQIMRTCGRILDSDKGRGLMDQYFKRMNSLAESRDLPLRIKFMLRDVIELRRDGWVPRKATSTEGPMPINQIRNDNE</Hsp hseq><Hsp m id l i n e>++P + +++ +DI+ E+ W P S +R A + S+ +FR+VR ILNKLTP+ F +L

+ + ++++ LKGVI LIFEKA+ EP +S YA +C+ L K E F LLL++C+ EFEE FE + DE EE ER +A+R+ LGNIKFIGEL KL +++E I+H C+++LL + E +ECLC+++ T G+ LD +K + MDQYF +M

+ + + RI+FML+DV++LR WVPR+ +GP I+QI + E</ Hsp m id l i n e></Hsp>

( . . . )Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

NCBI URL-API Blast

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

NCBI URL-API Blast

https://www.ncbi.nlm.nih.gov/blast/Doc/urlapi.html

$ c u r l ” h t t p s : //www. ncb i . nlm . n i h . gov/ b l a s t / B l a s t . c g i ?CMD=Put&QUERY=PAERLMERKADIE&DATABASE=nr&PROGRAM=b l a s t p&FILTER=L&HITLIST SZE=500”

( . . . )

<!−−QBla s t I n f oBeg i nRID = 1NRYGX9K014RTOE = 29

QBlas t In foEnd−−>

( . . . )

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

The End

Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses