Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for...
-
Upload
nolan-nichols -
Category
Science
-
view
466 -
download
0
Transcript of Reproducibility in human cognitive neuroimaging: a community-driven data sharing framework for...
Reproducibility in human cogni4ve neuroimaging: a community-‐driven data sharing framework for
provenance informa4on integra4on and interoperability
Nolan Nichols
Dissertation Defense Biomedical and Health Informatics
University of Washington Seattle, WA, USA December 8, 2014
1
Outline
• Introduction – Motivation for Research – Research Goal
• Background • Research approach • Conclusions and future directions
3
Introduction: Motivation for Research
• Human Cognitive Neuroimaging• Inves4gates brain structure and func4on in normal and neuropsychiatric condi4ons to improve human health
• Facilitates clinical decision making using imaging and cogni4ve phenotypes
4
• Biomedical Informatics (BMI) – The interdisciplinary field that studies and
pursues the effective use of biomedical data, information, and knowledge for scientific inquiry, problem solving, and decision making, motivated by efforts to improve human health
• Neuroinformatics – Applies BMI principles to develop techniques
and tools for acquiring, sharing, storing, publishing, analyzing, modeling, visualizing and simulating data across all levels of neuroscience
Introduction: Motivation for Research
5
Poline et al. (2012), Frontiers in Neuroinformatics
• Neuroinformatics Perspective• Research is a process with distinct stages• Provenance links together each stage
Introduction: Motivation for Research
6
• Problem: research is not reproducibile – Ioannidis JPA: Why Most Published Research Findings Are False. PLoS Med 2005
– Donoho D: An invita9on to reproducible computa9onal research. Biosta.s.cs 2010.
– Yong EE: Replica9on studies: Bad copy. Nature 2012 – Editorial: Reducing our irreproducibility. Nature 2012 – Begley CG: Six red flags for suspect work. Nature 2013 – Collins FS, Tabak LA: Policy: NIH plans to enhance reproducibility. Nature 2014
• Reproducibility issues exist along a spectrum – Sta4s4cal issues – Computa4onal issues
Introduction: Motivation for Research
7
Introduction: Motivation for Research
Can different researchers from a different lab obtain consistent results using a different methodology and data? Can different researchers
from a different lab obtain consistent results using the same methodology?
Can the same researchers in the same lab obtain consistent results using the same methodology and data?
Repeatable
Replicable
Reproducible
Confi
dence in Findings
Reproducibility Spectrum 8
• Sta4s4cal issues – Repor4ng bias of brain volume (Ioannidis, 2011), fMRI ac4va4on foci (David, 2013)
– Lack of sta4s4cal power in neuroscience (BuZon, 2013)
– Data collec4on and analysis methods are highly flexible across fMRI studies (Carp, 2012)
• Computa4onal issues – Lack of data sharing , code, and analysis environments
Introduction: Motivation for Research
9
Adapted from Peng (2011), Science.
Introduction: Motivation for Research
• Reusable Research – Can different researchers from a different lab apply a methodology to process shared data from different researchers in a different lab?
10
Poline et al. (2012), Frontiers in Neuroinformatics
Introduction: Motivation for Research
Barriers to reusable research • Data management systems are not interoperable • Data acquisi4on and analysis methods lack provenance • Terminologies are not harmonized (e.g., brain atlases, schemas)
11
• To enhance the reusability of neuroimaging data and workflow code
• To advance an informa4cs data exchange standard that incorporates provenance as a core concept
• To engage the neuroinforma4cs community as a partner in the design process
Introduction: Research Goals
12
Outline
• Introduction • Background – Data exchange – Provenance – Linked Open Data
• Research approach • Conclusions and future directions
13
Background: Data Exchange
hZp://xkcd.com/927/
• My goal is to extend existing standards to facilitate data reusability and interoperability
14
XML-‐based Clinical Experiment Data Exchange Schema, Gadde et al. 2012
XCEDE XML Schema• Experiment Hierarchy is composed of five levels
of information relevant to neuroimaging data exchange– Project– Subject– Visit– Study– Episode– Acquisition
Background: Data Exchange
15
• Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability, or trustworthiness.– Entity (e.g., files, data, publications)
• a physical, digital, conceptual, or other kind or thing with some fixed aspects
– Activity (e.g., workflow, editing a manuscript)• something that occurs over a period of time and acts upon or
with entities– Agent (e.g., person, software, organization)
• something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent’s activity.
W3C PROV Specification Suite
Background: Data Exchange
16
Background: Provenance
• An image registration process– wasAssociatedWith a registration algorithm– used an native-space natomical MRI
• A spatially-normalized anatomical MRI – wasGeneratedBy an image registration process– wasDerivedFrom an native-space anatomical MRI– wasAttrbutedTo a registration algorithm
• PROV is an extensible language to describe:– Responsibility– Data Flow– Process Flow
17
Background: Linked Open Data
Seman4c Web and Resource Descrip4on Framework
• A language to make statements about unique loca4ons (URLs) on the Web
• For example, at the URL of an anatomical MRI – ‘is a’ hZp://neurolex.org/wiki/Nlx_156814
18
Outline
• Introduction • Background • Research approach – Specific Aims – Study Design – Phase 1 – Phase 2
• Conclusions and future directions
20
Research Approach: Specific Aims
• Aim 1: Research and design a framework to represent, access, and query neuroimaging data provenance
• Aim 2: Develop an information system of Web services to compute and discover data provenance from brain imaging workflow
21
Research Approach: Study Design
• Phase 1 – Scalable Neuroimaging Initiative (SNI)– West Coast collaboration funded by the National Academies
Keck Futures Initiative (NAKFI) on Imaging Science– I led 15 meetings, 1 face-to-face workshop, and presented
preliminary results at 3 conferences
• Phase 2 – Neuroimaging Data Sharing (NIDASH)– Task force funded and organized by the International
Neuroinformatics Coordinating Facility (INCF)– I gathered feedback and redesigned the initial SNI framework
over 14 face-to-face workshops, 2 hackathons, and weekly meetings over two years
22
Evaluate metadata standards for data exchange (XCEDE)
Extend PROV using concepts from XCEDE (Neuroimaging Data
Model)
Redesign NiQuery using a sema4c Web service oriented architecture
Demonstrated a system for computa4onal access
to data (NiQuery) Phase 1 – SNI
Phase 2 – NIDASH
Aim 1 – Data Exchange Aim 2 – Informa9on System
Research Approach: Study Design
24
Outline
• Introduction • Background • Research approach – General Approach – Phase 1 – SNI – Phase 2 – NIDASH
• Conclusions and future directions
25
Research Approach: Phase 1 – SNI
• Scalable Neuroimaging Ini4a4ve’s Mission: – To specify and demonstrate an applica4on programming interface (API) that can support agile explora4on of distributed neuroimaging data sources while allowing for heterogeneous and evolving data management systems, ontologies, image data formats, image processing tools, and standard anatomical spaces.
• Aim 1 – Data Exchange: – Applied XCEDE as a data exchange standard for two neuroimaging databases
• Aim 2 – Informa4on System: – Implemented a system architecture for remote access to content within neuroimaging data
26
Aim 1• Queries shipped out
to multiple sources• Links are passed to
visualization app
Aim2• Extract time series from
data remotely• Browser and plotting all in
real-time
Research Approach: Phase 1 – SNI
27
App#
NIQ#
Allen##Ins+tute# ABA#Common#
API#
www.niquery.org#
UW#Stanford#
…# UW# XNAT#Common#API#
Stanford## NIMS#Common#API#
Database#Registry#Common#Data#Exchange#Layer#WebLbased#
Applica+ons#
Query#Integrator#
Query#Processing#
NiQuery presented at Neuroinforma4cs, 2012 Munich Brinkley (2012), Query Integrator. JBI.
• System too slow for real-time access (~30 secs.)• XCEDE too strict for changing datatype requirements• Framework doesn’t incorporate formal provenance
Research Approach: Phase 1 – SNI
28
Lessons learned • Harmonizing the XCEDE and PROV Schemas
– XCEDE has a strict hierarchical structure – PROV is designed as a graph and compatible with semantic
Web technologies – A harmonized XCEDE and PROV model could represent the
stages of electronic data capture, not just the experiment hierarchy
• Solution 1: Extend PROV to represent XCEDE • Solution 2: Redesign NiQuery using semantic Web
design concepts
Research Approach: Phase 1 – SNI
29
Outline
• Introduction • Background • Research approach – General Approach – Phase 1 – SNI – Phase 2 – NIDASH
• Conclusions and future directions
30
Research Approach: Phase 2 – NIDASH
• Neuroimaging Data Sharing Task Force Mission: – Aiming at reproducibility for the sake of reproducibility and enhanced research.
• Aim 1 – Data Exchange: – Applied XCEDE as a data exchange standard for two neuroimaging databases
• Aim 2 – Informa4on System: – Implemented a system architecture for remote access to content within neuroimaging data
31
• Extensions to PROV using elements from the XCEDE experiment hierarchy, workflow tools, and derived data to create Domain Object Models
• Enables a model bridging informa4on from experiment, workflow provenance, and derived data Keator, et al. 2013
Research Approach: Phase 2 – NIDASH
33
NIDM Collabora4on • Mee4ngs on Monday and Wednesday to discuss previous week’s issues
• Satellite mee4ngs at HBM, SfN, Imaging Gene4cs, and Neuroinforma4cs for 1-‐2 days each
• General Workflow to Contribute – Contributors create a “fork” from Github (an online version control system with
– Changes the vocabulary ad examples are logged as “commits” in the contributors “fork”
– Contributor submits a “pull request” to have changes reviewed
– Discussion takes place online un4l consensus is reached
35
NIDM Results • A harmonized model for repor4ng task-‐based fMRI across SPM, FSL and (soon) AFNI
hZp://nidm.nidash.org/specs/nidm-‐results.html 40
NIDM Results • All terms are modeled with an iden4fier, a defini4on, domain/range, and examples
• Model fipng:
41
Outline
• Introduction • Background • Research approach • Conclusions and future directions – Contributions – Implications – Future Directions
43
Conclusions and future direc4ons • Collabora4ve Framework Outcomes – Github is an effec4ve tool for standards development
• Closed 89 issues • 1,087 commits • 9 contributors • 1 publica4on, specifica4on suite
• Sorware engineering outcomes – Implemented in Nipype for workflow management – Being used to model task fMRI
• Implemented for SPM 12 and FSL – Being incorporated into NeuroVault for automated popula4on of a database to share SPMs
44
AcknowledgmentsCommittee MembersJames Brinkley (Chair)Susan Coldwell(GSR)Thomas GrabowskiNicholas Anderson
Neuroinformatics CommunitySatra Ghosh, Rich Stoner, JB
Poline, David Keator, Karl Helmer, Camille Maumet, Tom Nichols, Dan Marcus, Christian
Haselgrove, Jessica Turner, David Kennedy, Jack van Horn…
and many others!
Scalable Neuroimaging InitiativeUW: Todd Detwiler, Randy Frank
Stanford: Brian Wandell, Bob Dougherty, Gunnar Schaeffer
Integrated Brain Imaging CenterKatie Askren, Peter Boord, Elliot
Collins, Tina Guan, Clark Johnson, Tara Madhyastha, Sonya Mehta,
Todd Richards, Rosalia Tungaraza, Kurt Weaver, Karl Woelfer, Liza
Young… and everyone else!
45