Post on 22-Jan-2018
Our Goals
• ProvidecoreBioinformaticsresources
– UniProtKB/
–
– …
• Provideservicesandinfrastructure
– Vital-IT:HPCforthelife-sciences
– …
GeneticVariationsandDiseasesinUniProtKB/Swiss-Prot:
TheInsandOutsofExpertManualCuration
Famiglietti, et al.
We annotate a lot of disease/variants!
http://europepmc.org/abstract/MED/24848695
Why provide a public SPARQL endpoint
• A10manwetlaboratorycannotafford:
– tohosttheirowndatabaseinhouseholdingallorevenabitofalllifesciencedata.
Why provide a public SPARQL endpoint
• A10manwetlaboratorycannotafford:
– tohosttheirowndatabaseinhouseholdingallorevenabitofalllifesciencedata.
– nottohaveaccess,anduse,existinglifescienceinformation.
Why provide a public SPARQL endpoint
• ClassicalSQLcanbeprovidedontheweb
–Isnotpractical–Nofederation–Poorstandardsconformance
• Local SQL is expensive • LocalJSONisnobetter
• NorislocalXML
Data Integration Traditional
Pathway.txt
UniProt.txt
Pathway Parser
UniProt Parser
Pathway Schema
UniProt Schema
Own Lab Data
Data warehouse
SQL queries
$
$
$
$
$
$
Why not some other graph database?
EcosystemRDF enables sharing and reuse of data at low cost
Identity Precision Standards
Why provide a public SPARQL endpoint
• DocumentcentricRESTisnotenough
–Swiss-ProtavailableasREST–(over e-mail !!) since 1986
–expasy.ch since 1993 –www.uniprot.orgsince2002
• Most user use a GUI not a CLI • developersbuildGUIonaCLI
100
10'000
1'000'000
2015-01
2015-02
2015-03
2015-04
2015-05
2015-06
2015-07
2015-08
2015-09
queries ask selectconstruct describe
Queries per month in 2015 peak: 4 million per month
Real users
Mix between hard analytics and super specific
Estimate somewhere between: 400 - 1200 real humans per month
We know they are real because they take holidays ;)