The future of scientific information & communication
-
Upload
orcid-0000-0002-2668-4821 -
Category
Technology
-
view
2.130 -
download
2
description
Transcript of The future of scientific information & communication
The future of scientific information & communication
Antony Williams
SUNY Potsdam, April 12th 2013
How does the internet influence you?• How many of you visit the internet/check your
email less than a dozen times per day?• Where do you go for fact-checking?• How many on Facebook? How many on Twitter?• You know you have an online profile right?• Scientists…how many of you are working on
building a scientific profile online?• How many of you online now???
Me….and my vanity!
Searching Antony Williams
Searching ChemConnector…
http://re.vu/AntonyWilliams
Wikipediahttp://en.wikipedia.org/wiki/Antony_John_Williams
LinkedInhttp://www.linkedin.com/in/AntonyWilliams
Academia.edu
And Mendeleyhttp://www.mendeley.com/profiles/antony-williams/
And My Co-author Graph
And Videos
–YouTube–SciVee–Vimeo–Slideshare
I am Quantified…
ResearchGate
Google Scholar Citations
AltMetrics
Usage, Citations, Social Media…
Scientists are “Quantified”• Stats are gathered and analyzed • Employers can find them, tenure will depend
on them, funding are affected by them• Scientists Impact Factors, H-index and many
other variants• Science is both competitive and collaborative
If it was not just about me…
• Together we might:– build an encyclopedia– …and rate restaurants– …share book reviews – …and movie reviews– …and reviews of service providers– …organize sit-ins and social action– …and more data might just be Open
If it was not just about me…• Together we might:
– build an encyclopedia– …and rate restaurants– …provide book reviews to each other– …or movie reviews– …or reviews of service providers– …organize sit-ins and social action– …and more data might just be Open– …more scientists might collaborate and share
It is so difficult to navigate…
What’s the structure?What’s the structure?
Are they in our file?
Are they in our file?
What’s similar?What’s similar?
What’s the target?
What’s the target?Pharmacology
data?Pharmacology
data?
Known Pathways?
Known Pathways?
Working On Now?
Working On Now?Connections to
disease?Connections to
disease?
Expressed in right cell type?
Expressed in right cell type?
Competitors?Competitors?
IP?IP?
Let’s Change the World
• Let’s map together all historical chemistry data and build systems to integrate new data
• Heck, let’s integrate chemistry and biology data and add in disease data too
• Lets model the data and see if we can extract new relationships – quantitative and qualitative
• Let’s make it all available on the web
That’s a BIG Request
What About Something Smaller?
• We’re going to map the world• We’re going to take photos of as many places
as we can and link them together• We’ll let people annotate and curate the map• Then let’s make it available free on the web• We’ll make it available for decision making • Put it on Mobile Devices, Give it Away
Where am I from?
Wikipedia
Wikipedia
I care…I want to contribute…
The Power of Contribution
How do you spell Afonwen?
Whoa…
• So the world can be mapped…• We can enter a 3D environment within the map• We can add annotations• We can use the data, we can reference it, we
can extract it, we can make decisions with it• And we can do it on our lap, in our hands• Let’s crowdsource chemistry and biology!!!
Science is being Crowdsourced
• Crowdsourcing science is happening…– Contribution of data
• Our data, About us• Our data, generated in labs• Open Data, data validation and curation
– Contribution of software• Open Source, Open Standards
– Contribution of funding
If we can map the planet…
• …then we should map the Galaxy!
GalaxyZoo
Various ways to contribute
Where Am I From?
Where Am I From?
What can be done with Big Data
Patients Like Me
Patients Like Me
I am Chemist
Back to this….
• Let’s map together all historical chemistry data and build systems to integrate new data
• Heck, let’s integrate chemistry and biology data and add in disease data too
• Lets model the data and see if we can extract new relationships – quantitative and qualitative
• Let’s make it all available on the web
How can I contribute to chemistry?
• Publish data, share data, validate and curate data• Publish chemicals, syntheses and data• “Publish” – Papers, Blogs, Reports, Tweets,
Presentations, Videos • Contribute to Wikipedia • Participate in chemistry communities• Contribute to the Big Data
• I’ve performed a few dozen chemical syntheses• I’ve run thousands of analytical spectra• I’ve generated thousands of NMR assignments• I’ve probably published <5% of all work • Most of it has been lost• But things can be different today….
About Me…as a Chemist
Blog• Opinions, procedures, observations, experiences
Presentations
Presentations, Videos, Report, Pre-publications
YouTube/Vimeo/SciVee
• Presentations are easy to turn into movies and publish to these services
• Literally “gives you a voice”
Data as a Publication
Data as a Publication?
http://figshare.com/articles/Prevalence_and_use_of_Twitter_among_scholars/104629
Contributing to the “Big Data” Maps
My Data Contributions…
Data & Curations to ChemSpider
• The Royal Society of Chemistry free database• 28.5 million chemicals and growing daily• Software interfaces to integrate to• Amenable to community contribution
– Deposit structures, property data, spectral data– Data annotation, validation and curation
• 3-year Innovative Medicines Initiative project
• Integrating chemistry and biology data using semantic web technologies
• Open source code, open data and open standards
• Academics, Pharma companies, Publishers….
The Publishers!?
(Some) Publishers are Changing?
• Data cannot be copyrighted and we have lots• Scientists contribute data in document form • Most publishers are open to Open Access
• Scientific publications are built on data so what can be done to release the data? Much data is not published? Many scientists will not share…
Publications - a summary of work
• Scientific publications are a summary of work– Is all work reported?– How much science is lost to pruning?– What of value sits in notebooks and is lost?
• How much data is lost?– How many compounds never reported?– How many syntheses fail or succeed?– How many characterization measurements?
Community Repository for Data• Funding agencies encourage sharing of data• Increasing availability of “Open Data”• Institutional repositories have no specific domain
support • Why not develop a community repository for
chemistry data – private, public, embargoed?• Provides data to develop models/algorithms?
Chemical Database Service• National Chemical Database
Service for UK Academics
• Integrating Commercial Databases and Services
• Chemicals, analytical data, prediction algorithms
• Development of data repository
Model Building with Community Data
• Community data as a basis of model building– Consume data from available databases, community
data, new publications and build predictive algorithms for the community
– How many algorithms are reported and lost? How much repeat work is done in the domain of algorithmic development?
Pulling Data from our Archive
• Our contribution to the world of chemistry data• DERA – digitally enabling the RSC archive
– Text mining• Find chemicals, reactions, analytical data, properties
– Algorithmic checking• Validate algorithmically what we can - robots
– “Web 2.0 interfaces” for curating and validating
What if we could capture it all?Digitally Enhancing the RSC Archive
Human Validation and Curation
Web 2.0 Contribution
• We have been contributing to the web for a along time already – but how much in chemistry?
• A few blogs, an increasing amount of tweeting but what about data sharing in chemistry?
The Old Way of Challenging
Challenging Science…
Collaboration towards completion
Detailed constructive dialog
Oxidation by Sodium Hydride?
The Blogosphere Analyzes…
The Blogosphere Analyzes…
How much is in the archives?
Open Notebook Science Analysis
Oxidation by Sodium Hydride?
What is Hexacyclinol?
The Blogosphere “Discusses”…
What is real, what is fake?
http://www.youtube.com/watch?v=hMpAoC-h5SA
Chemistry is Dangerous!
http://tinyurl.com/cl2awnj
Chemistry is Dangerous
• Florida DJs May Face Felony for April Fools' Water Joke Worse Than Rubio's
“… told their listeners that "dihydrogen monoxide" was coming out of the taps
throughout the Fort Myers area.”
www.dhmo.org
How do you recognize good vs bad?
Is this real?
Junk vs Real
“We then established a collaboration with professor Sum Ting Wong, a fugitive from the North Korean University Hu Yu Hai Ding”
“..identified as the new protein Wai So Dim”
What is real, what is fake?
Helping to change science
• Participation and contribution • Immediacy of action• Platforms for contribution• Openness…whatever that is
Openness – Carries Licensing
• Openness may be hard..
• Open Access flavors• Open Source licenses• Open Data licenses• Open Notebook Science
Getting Called Out in Public…Rules for Licensing Data
Challenged in the Twittersphere
Annotating Articles Today…
Attribution to me…
Remember Quantifying Scientists• Scientists Impact Factors. Science is both
competitive and collaborative• Can we measure ALL contributions to science?
Article-Level metrics are here
The Alt-Metrics Manifesto• http://altmetrics.org/manifesto/
ImpactStory
ImpactStory
Scientists AltMetrics
Detailed Usage Statistics
Usage, Citations, Social Media, Etc
• Persistent unique digital identifier • Integrates to workflows such as manuscript
and grant submission• Supports automated linkages with your
professional activities
Enabled by
Micropublishing How much data is lost?
• How many reactions never get published?• How much data could be shared?• How many properties are measured and lost?• What stands in the way of sharing?
– Is it technology? – Permissions? “The Boss”, Licensing?
Micropublishing Syntheses
ChemSpider SyntheticPages
What is real, what is fake?
Profile
Interactive Data
Rewards and Recognition
• The badgesonomy culture of recognition is growing.
• Badges are commonplace– FourSquare – Klout
Rewards and Recognition
• Rewards and Recognition starting with CSSP then expands to other platforms
• Including paths to expose such recognition on AltMetrics platforms – in discussion…
Impact by Data Set onData
IC50 Measurements for 62 substituted benzoxazolesChemSpider Data Repository: DOI: 10.1356/CSID784.4
What Does the Future Hold?
The Data Deluge Will Not Go Away
The Linked Network Will Grow
We DON’T want this world..
Thanks Martin!
We’re not there yetYou can’t get there from here
Thank you
Email: [email protected] Twitter: ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams