building the research web
john wilbanks13 november 2007
open science and open publishingmit csail
science commons:
explicit contractual rights, granted in advance, to use and re-use knowledge (papers, tools, data)
readable by non-lawyers and machines
proto-infrastructure for knowledge sharing
technical platform: semantic web
most of what is known is thrown away(cognitive problem)
most of what is known is poorly fitted for use and reuse by machines and so+ware
(design problem)
most of what is known is poorly licensed for redesign and reuse by machines and so+ware
(legal problem)
very few research materials are actually available (social problem)
based on what’s been published in journals and databases, what signal transduction genes might be
active in pyramidal neurons?
what you get
DRD1, 1812 adenylate cyclase activationADRB2, 154 adenylate cyclase activationADRB2, 154 arrestin mediated desensitization of G-protein coupled receptor protein signaling pathwayDRD1IP, 50632 dopamine receptor signaling pathwayDRD1, 1812 dopamine receptor, adenylate cyclase activating pathwayDRD2, 1813 dopamine receptor, adenylate cyclase inhibiting pathwayGRM7, 2917 G-protein coupled receptor protein signaling pathwayGNG3, 2785 G-protein coupled receptor protein signaling pathwayGNG12, 55970 G-protein coupled receptor protein signaling pathwayDRD2, 1813 G-protein coupled receptor protein signaling pathwayADRB2, 154 G-protein coupled receptor protein signaling pathwayCALM3, 808 G-protein coupled receptor protein signaling pathwayHTR2A, 3356 G-protein coupled receptor protein signaling pathwayDRD1, 1812 G-protein signaling, coupled to cyclic nucleotide second messengerSSTR5, 6755 G-protein signaling, coupled to cyclic nucleotide second messengerMTNR1A, 4543 G-protein signaling, coupled to cyclic nucleotide second messengerCNR2, 1269 G-protein signaling, coupled to cyclic nucleotide second messengerHTR6, 3362 G-protein signaling, coupled to cyclic nucleotide second messengerGRIK2, 2898 glutamate signaling pathwayGRIN1, 2902 glutamate signaling pathwayGRIN2A, 2903 glutamate signaling pathwayGRIN2B, 2904 glutamate signaling pathwayADAM10, 102 integrin-mediated signaling pathwayGRM7, 2917 negative regulation of adenylate cyclase activityLRP1, 4035 negative regulation of Wnt receptor signaling pathwayADAM10, 102 Notch receptor processingASCL1, 429 Notch signaling pathwayHTR2A, 3356 serotonin receptor signaling pathwayADRB2, 154 transmembrane receptor protein tyrosine kinase activation (dimerization)PTPRG, 5793 transmembrane receptor protein tyrosine kinase signaling pathwayEPHA4, 2043 transmembrane receptor protein tyrosine kinase signaling pathwayNRTN, 4902 transmembrane receptor protein tyrosine kinase signaling pathwayCTNND1, 1500 Wnt receptor signaling pathway`
what you want
and if you want the tools associated with those genes?
cell lines?plasmids?reagents?
methods and protocols?unpublished data sets?
Open Access Content
Open SourceKnowledge Management
Open AccessResearch Tools
the research web
old collaboration:
reading the canon on paperquerying single-access databases
human as mediatorartisanal tool manufacturingtightly controlled distribution
new collaboration:
reading the canon with machinesintegrating databasescomputer as mediator
industrial tool manufacturingstandardized distribution
Open Access Content
the research web
legal framework for research: the paper metaphor
(ownership and access)
“Therefore, learned man, without wishing to be inopportune, I beg you most emphatically to communicate your discovery to the learned world”
Letter to Copernicus, 1 November 1536Archbishop of Capua Nikolaus Cardinal von Schönberg
copyrights allow ownership of papers by the nature of the medium, not the nature of the work
the nature of the work is profoundly integrative: one set of facts per “paper” are connected to elements of
knowledge throughout science
http://orpheus-1.ucsd.edu/acq/license/cdlelsevier2004.pdf
“By open access to the literature, we mean its free availability on the public internet, permitting users to read, download, copy, distribute, print, search, or link
to the full texts of these articles, crawl them for indexing, pass them as data to so+ware, or use them for any other lawful purpose, without financial, legal
or technical barriers other than those inseparable from gaining access to the internet itself.”
image from the public library of sciencelicensed to the public under CC-BY 3.0
“The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly
acknowledged and cited” - the Budapest Open Access Initiative
image from the public library of sciencelicensed to the public under CC-BY 3.0
<License rdf:about="http://creativecommons.org/licenses/by/3.0/"> <permits rdf:resource="http://creativecommons.org/ns#Reproduction" /> <permits rdf:resource="http://creativecommons.org/ns#Distribution" /> <permits rdf:resource="http://creativecommons.org/ns#DerivativeWorks" /> <requires rdf:resource="http://creativecommons.org/ns#Notice" /> <requires rdf:resource="http://creativecommons.org/ns#Attribution" /> </License>
creates another dimension for search:
“find me stuff I know I have the right to use”
cc:license
FLOSS:License
GNU/LINUX
cc:license
a commons for data: create zones of certainty
“freedom to integrate”
no quid pro quo
Open AccessResearch Tools
the research web
legal framework for research: the artisanal metaphor
(pre-industrial culture of tool-making)
Alzheimer’sDisease
Huntington’sDisease
MultipleSclerosis
Autism
research silos: same genes, same cells, same brain
Alzheimer’sDisease
Huntington’sDisease
MultipleSclerosis
Autism
bilateral contracts and deals
Alzheimer’sDisease
Huntington’sDisease
MultipleSclerosis
Autism
“one to many” offers / networks
Provider Lab
Depository
Recipient Lab
MTA
deposittracking
fulfillment
searching / ordering
industrial tool manufacture and distribution
cc:license
apply these methods to physical materials as well:available via standard, one-click contracts
discoverable by digital identifiersfulfilled by third parties to the transactionacknowledged by citation-inspired systems
usage modes:
A) public offer (one-to-many)
B) private negotiation (eliminate inherent disadvantages in expertise)
thanks to the metadata, efficient to discover resources:while reading papers
while browsing databases
Open SourceKnowledge Management
the research web
technical framework for research: the “one database per child” metaphor
(it depends on what the meaning of “gene” is)
27,266 papers
4,563 papers
41,985 papers
10,365 papers
128,437 papers
“knowledge = product”is causing systemic failures to exploit opportunities
afforded by the network
27,266 papers
4,563 papers
41,985 papers
10,365 papers
128,437 papers
“knowledge = network” better reflects the reality of knowledge
“find me genes involved in
signal transduction
that are related to pyramidal
neurons”
gp145trkB does not require p75LNGFR to form a functional receptor for BDNF in hippocampal pyramidal neurons
gp145trkB does not require p75LNGFR to form a functional receptor for BDNF in hippocampal pyramidal neurons
is this a creative expression or a forced move?
NeuronDBBAMS
Literature
Homologene
SWAN
Entrez Gene
Gene Ontology
Mammalian Phenotype
PDSPki
BrainPharm
AlzGene
Antibodies
PubChem
MESH
Reactome
Allen Brain Atlas
slide derived from W3C HCLS
Drug
NeuronPathological
Agent
Receptor
Channel
inhibitsinhibits
Agent
NeuronalProperty
PathologicalChange
involvesinvolves inhibits
Compartment
has
is_located_in
is_located_in
slide courtesy of kei chung, yale
NeuronDB
BAMS
Literature
Homologene
SWAN
Entrez Gene
Gene Ontology
Mammalian Phenotype
PDSPki
BrainPharm
AlzGene
Antibodies
PubChem
MESH
Reactome
Allen Brain Atlas
slide courtesy of W3C HCLS
prefix go: <http://purl.org/obo/owl/GO#>prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>prefix mesh: <http://purl.org/commons/record/mesh/>
prefix sc: <http://purl.org/science/owl/sciencecommons/>prefix ro: <http://www.obofoundry.org/ro/ro.owl#>
select ?genename ?processnamewhere
{ graph <http://purl.org/commons/hcls/pubmesh> { ?paper ?p mesh:D017966 .
?article sc:identified_by_pmid ?paper. ?gene sc:describes_gene_or_gene_product_mentioned_by ?article.
} graph <http://purl.org/commons/hcls/goa>
{ ?protein rdfs:subClassOf ?res. ?res owl:onProperty ro:has_function.
?res owl:someValuesFrom ?res2. ?res2 owl:onProperty ro:realized_as.
?res2 owl:someValuesFrom ?process. graph <http://purl.org/commons/hcls/20070416/classrelations>
{{?process <http://purl.org/obo/owl/obo#part_of> go:GO_0007166} union
{?process rdfs:subClassOf go:GO_0007166 }} ?protein rdfs:subClassOf ?parent.
?parent owl:equivalentClass ?res3. ?res3 owl:hasValue ?gene.
} graph <http://purl.org/commons/hcls/gene>
{ ?gene rdfs:label ?genename } graph <http://purl.org/commons/hcls/20070416>
{ ?process rdfs:label ?processname}}
Mesh: Pyramidal Neurons
Pubmed: Journal Articles
Entrez Gene: Genes
GO: Signal Transduction
running code
Many of the genes are indeed related to Alzheimer’s Disease through gamma
secretase (presenilin) activity
DRD1, 1812 adenylate cyclase activationADRB2, 154 adenylate cyclase activationADRB2, 154 arrestin mediated desensitization of G-protein coupled receptor protein signaling pathwayDRD1IP, 50632 dopamine receptor signaling pathwayDRD1, 1812 dopamine receptor, adenylate cyclase activating pathwayDRD2, 1813 dopamine receptor, adenylate cyclase inhibiting pathwayGRM7, 2917 G-protein coupled receptor protein signaling pathwayGNG3, 2785 G-protein coupled receptor protein signaling pathwayGNG12, 55970 G-protein coupled receptor protein signaling pathwayDRD2, 1813 G-protein coupled receptor protein signaling pathwayADRB2, 154 G-protein coupled receptor protein signaling pathwayCALM3, 808 G-protein coupled receptor protein signaling pathwayHTR2A, 3356 G-protein coupled receptor protein signaling pathwayDRD1, 1812 G-protein signaling, coupled to cyclic nucleotide second messengerSSTR5, 6755 G-protein signaling, coupled to cyclic nucleotide second messengerMTNR1A, 4543 G-protein signaling, coupled to cyclic nucleotide second messengerCNR2, 1269 G-protein signaling, coupled to cyclic nucleotide second messengerHTR6, 3362 G-protein signaling, coupled to cyclic nucleotide second messengerGRIK2, 2898 glutamate signaling pathwayGRIN1, 2902 glutamate signaling pathwayGRIN2A, 2903 glutamate signaling pathwayGRIN2B, 2904 glutamate signaling pathwayADAM10, 102 integrin-mediated signaling pathwayGRM7, 2917 negative regulation of adenylate cyclase activityLRP1, 4035 negative regulation of Wnt receptor signaling pathwayADAM10, 102 Notch receptor processingASCL1, 429 Notch signaling pathwayHTR2A, 3356 serotonin receptor signaling pathwayADRB2, 154 transmembrane receptor protein tyrosine kinase activation (dimerization)PTPRG, 5793 transmembrane receptor protein tyrosine kinase signaling pathwayEPHA4, 2043 transmembrane receptor protein tyrosine kinase signaling pathwayNRTN, 4902 transmembrane receptor protein tyrosine kinase signaling pathwayCTNND1, 1500 Wnt receptor signaling pathway`
http://hcls1.csail.mit.edu:8890/sparql/?query=prefix%20go%3A%20%3Chttp%3A%2F%2Fpurl.org%2Fobo%2Fowl%2FGO%23%3E%0Aprefix%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0Aprefix%20owl%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%0Aprefix%20mesh%3A%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Frecord%2Fmesh%2F%3E%0Aprefix%20sc%3A%20%3Chttp%3A%2F%2Fpurl.org%2Fscience%2Fowl%2Fsciencecommons%2F%3E%0Aprefix%20ro%3A%20%3Chttp%3A%2F%2Fwww.obofoundry.org%2Fro%2Fro.owl%23%3E%0A%0Aselect%20%3Fgenename%20%3Fprocessname%0Awhere%0A%7B%20%20graph%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Fhcls%2Fpubmesh%3E%0A%20%20%20%20%20%7B%20%3Fpaper%20%3Fp%20mesh%3AD017966%20.%0A%20%20%20%20%20%20%20%3Farticle%20sc%3Aidentified_by_pmid%20%3Fpaper.%0A%20%20%20%20%20%20%20%3Fgene%20sc%3Adescribes_gene_or_gene_product_mentioned_by%20%3Farticle.%0A%20%20%20%20%20%7D%0A%20%20%20graph%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Fhcls%2Fgoa%3E%0A%20%20%20%20%20%7B%20%3Fprotein%20rdfs%3AsubClassOf%20%3Fres.%0A%20%20%20%20%20%20%20%3Fres%20owl%3AonProperty%20ro%3Ahas_function.%0A%20%20%20%20%20%20%20%3Fres%20owl%3AsomeValuesFrom%20%3Fres2.%0A%20%20%20%20%20%20%20%3Fres2%20owl%3AonProperty%20ro%3Arealized_as.%0A%20%20%20%20%20%20%20%3Fres2%20owl%3AsomeValuesFrom%20%3Fprocess.%0A%20%20%20graph%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Fhcls%2F20070416%2Fclassrelations%3E%0A%20%20%20%20%20%7B%7B%3Fprocess%20%3Chttp%3A%2F%2Fpurl.org%2Fobo%2Fowl%2Fobo%23part_of%3E%20go%3AGO_0007166%7D%0A%20%20%20%20%20%20%20union%0A%20%20%20%20%20%20%7B%3Fprocess%20rdfs%3AsubClassOf%20go%3AGO_0007166%20%7D%7D%0A%20%20%20%20%20%20%20%3Fprotein%20rdfs%3AsubClassOf%20%3Fparent.%0A%20%20%20%20%20%20%20%3Fparent%20owl%3AequivalentClass%20%3Fres3.%0A%20%20%20%20%20%20%20%3Fres3%20owl%3AhasValue%20%3Fgene.%0A%20%20%20%20%20%20%7D%0A%20%20%20graph%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Fhcls%2Fgene%3E%0A%20%20%20%20%20%7B%20%3Fgene%20rdfs%3Alabel%20%3Fgenename%20%7D%0A%20%20%20graph%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Fhcls%2F20070416%3E%0A%20%20%20%20%20%7B%20%3Fprocess%20rdfs%3Alabel%20%3Fprocessname%7D%0A%7D&format=&maxrows=50
prefix go: <http://purl.org/obo/owl/GO#>prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>prefix mesh: <http://purl.org/commons/record/mesh/>
prefix sc: <http://purl.org/science/owl/sciencecommons/>prefix ro: <http://www.obofoundry.org/ro/ro.owl#>
select ?genename ?processnamewhere
{ graph <http://purl.org/commons/hcls/pubmesh> { ?paper ?p mesh:D009369 . ?article sc:identified_by_pmid ?paper.
?gene sc:describes_gene_or_gene_product_mentioned_by ?article. }
graph <http://purl.org/commons/hcls/goa> { ?protein rdfs:subClassOf ?res.
?res owl:onProperty ro:has_function. ?res owl:someValuesFrom ?res2.
?res2 owl:onProperty ro:realized_as. ?res2 owl:someValuesFrom ?process.
graph <http://purl.org/commons/hcls/20070416/classrelations> {{?process <http://purl.org/obo/owl/obo#part_of> go:GO_0007166}
union {?process rdfs:subClassOf go:GO_0006610 }}
?protein rdfs:subClassOf ?parent. ?parent owl:equivalentClass ?res3.
?res3 owl:hasValue ?gene. }
graph <http://purl.org/commons/hcls/gene> { ?gene rdfs:label ?genename }
graph <http://purl.org/commons/hcls/20070416> { ?process rdfs:label ?processname}
}
the “view source” effect
building on a commons means “snap together” integration of the tools, data, and literature
fragile
existing infrastructure must stay openfreedom to access the canonfreedom to build the tools
new infrastructure must persistagreements on names and meanings
open names and meanings
Open Access Content
Open SourceKnowledge Management
Open AccessResearch Tools
the research web
“design for re-use”
“follow the rule of least power”
“you! in the lab! start innovating!”
use the commons as the backbone of the research web:
1)evaluate what you believe against what is online2) low-friction access to what is needed for further work
use the commons as a method for integrating emerging and disruptive technologies into the research web
atgaccatgattacgccaagcgcgcaattaaccctcactaaagggaacaaaagctggagctccaccgcggtggcggcagcactagagctagtggatcccccgggctgtagaaattcgatatcaagcttatcgataccgtcgacctcgagggggggcccggtacccaattcgccctatagtgagtcgtattacgcgcgctcactggccgtcgttttacaacgtcgtgactgggaaaaccctggcgttacccaacttaatcgccttgcagcacatccccctttcgccagctggcgtaatagcgaagaggcccgcaccgatcgcccttcccaacagttgcgcagcctgaataataa
codon devices
if we can’t deal with the data we create in a classic drug discovery context, how will we deal with the
data that comes from user-generated biology?
retail: $79.95 + shippingages ten and up
thank you
[email protected]://sciencecommons.org
Investors and support: MacArthur FoundationKauffman Foundation
MIT CSAILHighQ FoundationOmidyar Network
Teranode Corporationanonymous charitable foundations
in-kind supporters
MTA:iBridge Network
AddgeneStrainInfo.net
Cure Huntington’s Disease InitiativeEmory University
City of Hope Hospital
Open Access:SPARC/Ass’n of Research Libraries
Carnegie-Mellon UniversityMIT
Public Library of ScienceBioMed Central
Nature PrecedingsSpringer Author Choice
Global Biodiversity Information FacilityCODATA
Neurocommons:W3C HCLS
Millennium Pharmaceuticals (so+ware and curated content)
Virtuoso So+wareHewlett-Packard
acknowledgments
Top Related