Software Metadata: Describing "dark software" in GeoSciences
Transcript of Software Metadata: Describing "dark software" in GeoSciences
1 Yolanda Gil USC Information Sciences Institute [email protected]
Software Metadata: Describing “dark software” in Geosciences
Yolanda Gil, Daniel Garijo
Information Sciences Institute and Department of Computer Science
University of Southern California @yolandagil, @dgarijov
{gil,dgarijo}@isi.edu
http://www.ontosoft.org Building Block
2 Yolanda Gil USC Information Sciences Institute [email protected]
We have all been here…
3 Yolanda Gil USC Information Sciences Institute [email protected]
The Value of Software: Reproducibility
Financial
Human lives
Reliability
Scientific integrity
Financial
Trust
5/ 29/ 15, 1:49 AMRetracted Scientific Studies: A Growing List - NYTimes.com
Page 1 of 8http:/ / www.nytimes.com/ interactive/ 2015/ 05/ 28/ science/ retractions- scientific- studies.html?smid= tw- nytimesscience&_r= 1
Sections Home Search Skip to content
Advertisement
EmailShareTweetMore
Search
Subscribe
Log In 0 Settings
Close search
search sponsored by
Search NYTimes.com
Clear this text input Go
http://nyti.ms/1HPVX1t
1. 1. Study on Attitudes Toward Same-Sex Marriage Is Retracted by a Scientific Journal
2. A Proposal to Modify Plants Gives G.M.O. Debate New Life
3. Chimpanzees in Liberia, Used in New York Blood Center Research, Face Uncertain Future
4. Matter
The Human Family Tree Bristles With New Branches
5. Observatory
Race and Gender Biases Can be Reduced With Sleep Therapy, Study Finds
6. Observatory
Ancient Skull Suggests an Early Murder
7. National Briefing | Washington
Live Anthrax Spores Shipped to Laboratories
8. A Robot That Can Perform Brain Surgery on a Fruit Fly
9. Jinghong Journal
China’s High Hopes for Growing Those Rubber Tree Plants
10. Scientists Warn to Expect More Weather Extremes
11. Arguing in Court Whether 2 Chimps Have the Right to ‘Bodily Liberty’
12. Sister Megan Rice, Freed From Prison, Looks Ahead to More Anti-Nuclear Activism
13. Obama Announces New Rule Limiting Water Pollution
14. Lassa Fever Carries Little Risk to Public, Experts Say
SUBSCRIBE NOW
5/ 29/ 15, 1:49 AMRetracted Scientific Studies: A Growing List - NYTimes.com
Page 5 of 8http:/ / www.nytimes.com/ interactive/ 2015/ 05/ 28/ science/ retractions- scientifi c- studies.html?smid= tw- nytimesscience&_r= 1
The retraction by Science of a study of changing attitudes about gay marriage is
the latest prominent withdrawal of research results from scientific literature.
And it very likely won't be the last. A 2011 study in Nature found a 10-fold
increase in retraction notices during the preceding decade.
Many retractions barely register outside of the scientific field. But in some
instances, the studies that were clawed back made major waves in societal
discussions of the issues they dealt with. This list recounts some prominent
retractions that have occurred since 1980.
Photo
In 1998, The Lancet, a British medical journal,
published a study by Dr. Andrew Wakefield
that suggested that autism in children was
caused by the combined vaccine for measles,
mumps and rubella. In 2010, The Lancet
retracted the study following a review of Dr.
Wakefield's scientific methods and financial
conflicts.
Despite challenges to the study, Dr.
Wakefield's research had a strong effect on
many parents. Vaccination rates tumbled in
Britain, and measles cases grew. American
antivaccine groups also seized on the research. The United States had more
cases of measles in the first month of 2015
than the number that is typically diagnosed in a full year.
Vaccines andAutism
Papers published by Japanese researchers in Nature in 2014 claimed to provide
an easy method to create multipurpose stem cells, with eventual implications
for the treatment of diseases and injuries. Months later, the authors, including
Haruko Obokata, issued a retraction. An investigation by one of Japan's most
prestigious scientific institutes, where much of the research occurred, found
that the author had manipulated some of the images published in the study.
Approximately one month after the retraction, one of Ms. Obokata's co-authors,
Yoshiki Sasai, was found hanging in a stairwell of his office. He had taken his
own life.
Stem Cell Production
4 Yolanda Gil USC Information Sciences Institute [email protected]
Quantifying the Value of Software through
“Reproducibility Maps” [Bourne & Gil et al 12]
2 months of effort in reproducing published method (in PLoS’10)
Authors expertise was required
Comparison of ligand binding sites
Comparison of dissimilar protein structures
Graph network generation
Molecular Docking
Work with P. Bourne of UCSD
5 Yolanda Gil USC Information Sciences Institute [email protected]
Geosciences Software Today
There are repositories of model software
There are no shared repositories for other kinds of geosciences software (e.g. model-data preparation services…)
There are general software repositories with no standard metadata
Most scientists are not aware of the value of their software
Most geosciences software is not shared
6 Yolanda Gil USC Information Sciences Institute [email protected]
“Dark Software”
Models that are not published • Eg from a PhD thesis
Data preparation software • Data pre-processing and
QC can take up to 80% of a project’s effort
Visualization software
“Dark Software” is the counterpart of “Dark Data” [Heidorn 2008]
7 Yolanda Gil USC Information Sciences Institute [email protected]
Recommender system Interoperability
Publication
Community
Learning
Structured metadata Interactive advice
Best practices Multimedia lessons
Recommender system � Interoperability
Publication
Community
Learning
Structured metadata � Interactive advice
� Best practices � Multimedia lessons
8 Yolanda Gil USC Information Sciences Institute [email protected]
Publication
Community
Learning
UK Software Institute
Software Carpentry
CIG ESMF
Critical Zone Observatory
Early Career Advisory Board
FES/ESIP
CSDMS
EarthCube RCNs
EarthCube Building Blocks
Recommender system � Interoperability
Publication
Community
Learning
Structured metadata � Interactive advice
� Best practices � Multimedia lessons
Collaborating with SEN C4P EC3
9 Yolanda Gil USC Information Sciences Institute [email protected]
The OntoSoft Ontology for Describing
Scientific Software Metadata [Gil et al 2015]
An ontology for scientific software metadata
• Intended to describe scientific software
• Designed with scientists in mind to guide them to deposit and describe their software in a software registry
Major categories of metadata: what does a scientist need?
1. identify software
2. understand what it does and its utility for research,
3. execute the software,
4. get support if questions arise,
5. do research with it, and
6. contribute to its development
10 Yolanda Gil USC Information Sciences Institute [email protected]
OntoSoft Metadata Categories
http://www.ontosoft.org/software
11 Yolanda Gil USC Information Sciences Institute [email protected]
Describing Scientific Software in OntoSoft
http://www.ontosoft.org/portal
Metadata can be exported
in several formats
(HTML, RDF, JSON)
Metadata for 3DDY Software
Metadata properties
collected through
simple questions
Indicators of metadata
completeness
Set permissions for 3DDY
Metadata properties
organized into categories that
make sense to scientists
Set permission for Documentation metadata for 3DDY software
Crowdsourcing of
metadata through access
control permissions
Automatic import of metadata
from other repositories
12 Yolanda Gil USC Information Sciences Institute [email protected]
Software entries
from distributed
repositories are
readily accessible
Semantic
search
Comparison matrix
of software entries
PIH M PIH M gis D rEICH TauD EM WBM sed
nto$o%$
Metadata
completion
highlighted
Software is
contrasted
by property
13 Yolanda Gil USC Information Sciences Institute [email protected]
Recommender system � Interoperability
Publication
Community
Learning
Structured metadata � Interactive advice
� Best practices � Multimedia lessons
Conclusions
Geosciences software is a valuable research product • Must embed best practices of
software sharing into research activities
Improve productivity, quality, reproducibility
OntoSoft contributions • Ontology of scientific
software metadata
• Portal for software registry
• Training scientists to write Geoscience Papers of the Future
Sign up for a GPF training session!
http://www.ontosoft.org
http://www.ontosoft.org/software
http://www.ontosoft.org/portal
http://www.ontosoft.org/gpf
14 Yolanda Gil USC Information Sciences Institute [email protected]
More Information
http://www.ontosoft.org
http://www.ontosoft.org/software
http://www.ontosoft.org/portal
http://www.ontosoft.org/gpf
OntoSoft: Capturing Scientific Software Metadata. Yolanda Gil, Varun Ratnakar, and Daniel Garijo. Proceedings of the Eighth ACM International Conference on Knowledge Capture (K-CAP), 2015.
OntoSoft: A Distributed Semantic Registry for Scientific Software. Yolanda Gil, Daniel Garijo, Saurabh Mishra, and Varun Ratnakar. Under review, 2016.
DRAT: An Unobtrusive, Scalable Approach to Large Scale Software License Analysis. Chris A. Mattmann, Ji-Hyun Oh, Tyler Palsulich, Lewis John McGibbney, Yolanda Gil, and Varun Ratnakar. Proceedings of the Fourth International Workshop on Software Mining, held in conjunction with the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015.
Cyber-Innovated Watershed Research at the Shale Hills Critical Zone Observatory. Xuan Yu, Chris Duffy, Yolanda Gil, Lorne Leonard, Gopal Bhatt, and Evan Thomas. IEEE Systems Journal, to appear.
Collaborative Software Development Needs in Geosciences. Yolanda Gil, Eunyoung Moon and James Howison. Proceedings of the Second Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE2), held in conjunction with the IEEE ACM International Conference on High Performance Computing (SC), New Orleans, LA, November 2014.
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users. Daniel Garijo, Oscar Corcho, Yolanda Gil, Meredith N. Braskie, Derrek Hibar, Xue Hua, Neda Jahanshad and, Paul Thompson and Arthur W. Toga. Proceedings of the IEEE Conference on e-Science, 2014.
FragFlow: Automated Fragment Detection in Scientific Workflows. Daniel Garijo, Oscar Corcho, Yolanda Gil, Boris A. Gutman, Ivo D. Dinov, Paul Thompson and Arthur W. Toga. Proceedings of the IEEE Conference on e-Science, Guarujua, Brazil, October 2014.
An Overview of Mobile Applications for Field Science. Anna Zeng, Kevin Zeng, Yolanda Gil, and Matty Mookerjee. GeoSoft Project Report, September 2014.
The CSDMS Standard Names: Cross-Domain Naming Conventions for Describing Process Models, Data Sets and Their Associated Variables. Scott D. Peckham. Proceedings of the Seventh International Congress on Environmental Modeling and Software, San Diego, CA, June 2014.
Web Applications that Share Level-12 HUC Data and Models of the CONUS. Lorne Leonard and Chris Duffy. Proceedings of the Seventh International Congress on Environmental Modeling and Software, San Diego, CA, June 2014.
Intelligent Workflow Systems and Provenance-Aware Software. Yolanda Gil. Proceedings of the Seventh International Congress on Environmental Modeling and Software, San Diego, CA, June 2014.
15 Yolanda Gil USC Information Sciences Institute [email protected]
Acknowledgements
The OntoSoft project team includes Chris Duffy (PSU), Chris Mattmann (JPL),
Scott Pechkam (CU), Ji-Hyun Oh (USC), Varun Ratnakar (USC), and Erin
Robinson (ESIP)
The Geoscience Papers of the Future ideas were significantly improved through
input from GPF pioneers Cedric David (JPL), Ibrahim Demir (UI), Bakinam
Essawy (UV), Robinson W. Fulweiler (BU), Jon Goodall (UV), Leif Karlstrom
(UO), Kyo Lee (JPL), Heath Mills (UH), Suzanne Pierce (UT), Allen Pope (CU),
Mimi Tzeng (DISL), Karan Venayagamoorthy (CSU), Sandra Villamizar (UC),
and Xuan Yu (UD)
Thank you to James Howison (UT), Lisa Kempler (Matworks), and Greg Wilson
(Software Carpentry) for their feedback on best practices for software sharing
Thank you to the scientists and other colleagues that have contributed ideas
and asked hard questions about software stewardship
Thank you to the National Science Foundation and the EarthCube program for
supporting this work
EarthCube!ICER-1440323 ICER-1343800
http://www.ontosoft.org
http://www.ontosoft.org/software
http://www.ontosoft.org/portal
http://www.ontosoft.org/gpf