Pushing Chemical Biology Through the Pipes
-
Upload
rguha -
Category
Technology
-
view
469 -
download
2
Transcript of Pushing Chemical Biology Through the Pipes
![Page 1: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/1.jpg)
Pushing Chemical Biology Data Through the Pipes
Architec(ng and Extending the BARD API
Rajarshi Guha, John Braisted, Ajit Jadhav, Dac-‐Trung Nguyen, Tyler
Peryea, Noel Southall
September 9, 2014 Indianapolis, IN
![Page 2: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/2.jpg)
Outline
MoIvaIons for the BARD
InteracIng with the BARD
Extensibility & Community
![Page 3: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/3.jpg)
The BioAssay Research Database
• Originated from the NIH Molecular Libraries Program
• MoIvated to make the bioassay data generated by the MLP more accessible and amenable to exploraIon and hypothesis generaIon
• Joint effort between NCGC, Broad, UNM, UMiami, Vanderbilt, Scripps, Burnham
![Page 4: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/4.jpg)
Goals of the BARD
BARD’s mission is to enable novice and expert scien7sts to effec7vely u7lize MLP data to
generate new hypotheses
• Developed as an open-‐source, industrial-‐strength plaWorm to support public translaIonal research
• Foster new methods to interpret & analyze chemical biology data
• Develop and adopt an Assay Data Standard • Enable co-‐loca7on of data and methods
![Page 5: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/5.jpg)
Components of the BARD Pla=orm
CAP, Data Dictionary, and Results Deposition Data model created & populated
CAP UI with View and basic editing
Dictionary defined as OWL using Protégé
Warehouse loaded with all PubChem AIDs and results
Warehouse loaded with GO terms, KEGG terms, and DrugBank annotations
![Page 6: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/6.jpg)
Interac?ng with the BARD
![Page 7: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/7.jpg)
Searching the BARD
• Full text search via Lucene/Solr – Key entry point for new users – Search code runs queries in parallel – Fast faceIng, auto suggest, filters
• EnIIes are linked manually via Solr schema – Allows us to pick up enIIes related to the query – Supply matching context
• Custom NCATS code for fast structure searches
![Page 8: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/8.jpg)
The BARD API
• A RESTful applicaIon programming interface • Access to individual & collecIons of enIIes • Documented via the wiki • Versioned • Hit a URL, get a JSON response
– Every language supports parsing JSON documents – Easy to inspect via the browser or REST clients
h\p://bard.nih.gov/api/v18/
![Page 9: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/9.jpg)
API Architecture
• Java, read-‐only, deployed on Glassfish cluster • Different funcIonality hosted in different containers – Maintenance, security – Stability – Performance
API
Text Search
Struct Search
Data Warehouse
Plugins
![Page 10: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/10.jpg)
ETL Database Text Search Engine Structure Search Engine
Caching Layer
http://bard.nih.gov/api
Jersey Webapps deployed on HA
Application Server Cluster
Open Source as Far as Possible
![Page 11: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/11.jpg)
API Resources
• Covers many data types • Each resource supports a variety of sub-‐resources – Usually linked to other resources
• Use /_info to see what sub-‐resources are available
/assays/_info!
![Page 12: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/12.jpg)
API Resources
En7ty Count /assays! 990 /biology! 1080 /cap! 1942 /compounds! 42,572,799 /documents! 10,499 /experiments! 1314 /exptdata! 32,754,242 /projects! 144 /substances! 113,751,456
![Page 13: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/13.jpg)
Data Warning
We’re s7ll in the process of data cura7on and QC so while you can access all BARD data, keep in mind that much of it will be undergoing review and cura7on and so may change
![Page 14: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/14.jpg)
Summary Resources
• A number of enIIes have a /summary sub-‐resource (Compounds, Projects)
• Aggregates informaIon, suitable for dashboards
![Page 15: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/15.jpg)
JSON Responses
• All responses are currently JSON
• EnIIes can include other enIIes (recursively)
• JSON Schema is available via /_schema
/assays/_schema!
![Page 16: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/16.jpg)
Extending the API
• Concept of plugins – Expands the resource hierarchy – Has to be wri\en in Java
• What can a plugin accept? – Anything
• What can a plugin provide? – Anything – plain text, XML, JSON, HTML, Flash
• Plugin manifest – describe available resources, argument types
![Page 17: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/17.jpg)
Plugins have to be deployable on the JVM
Plugin Development Workflow
![Page 18: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/18.jpg)
What Can a Plugin Expect?
• Direct access to the database via JDBC • Faster access to REST API via co-‐localizaIon • No local storage in the BARD warehouse
– But the plugin can use its own storage (such as an embedded database)
• Plugins have access to system JARs (e.g., XOM, JChem) but should bundle their own required dependencies
![Page 19: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/19.jpg)
Plugin Valida?on
• Run a series of checks on a plugin before deployment – Catches manifest/resource errors – Doesn’t check for correctness (not our job) – Examines plugin Java class – Examines final plugin package
• Run as a command line tool or from your code • Could be made into an Ant/Maven plugin
![Page 20: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/20.jpg)
What Plugins are Available?
• The plugin registry provides a list of deployed plugins – Path to the plugin – Version – Availability
• Long term goal is to have a plugin store
/plugins/registry/list!
![Page 21: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/21.jpg)
Exemplar Plugins
• Look at the plugin repository on Github • Ranges from
– trivial calls to external service – Incorporate external tools/programs
• Current set of plugins highlight plugin development
• Plugins let us move non-‐essenIal funcIonality out of the core API
![Page 22: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/22.jpg)
P. Rydberg et al, h\p://www.farma.ku.dk/smartcyp/
BARD-‐ SMARTCyp
![Page 23: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/23.jpg)
More Than Just Data
BARD is not just a data store – it’s a platform
• Seamlessly interact with users’ preferred tools • Allows the community to tailor it to their needs • Serve as a meeting ground for experimental
and computational methods • Enhance collaboration opportunities
![Page 24: Pushing Chemical Biology Through the Pipes](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e9a30b4c90526358b5347/html5/thumbnails/24.jpg)
Links
@AskTheBARD
hNp://bard.nih.gov
API Source Code Repository
Plugin Source Code Repository
Development Wiki