Post on 29-Nov-2014
description
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Adventures in Translational Bioinformatics
Harry Hochheiser
University of Pittsburgh School of Medicine
Department of Biomedical Informaticsharryh@pitt.edu
© 2010 Harry HochheiserLicensed under Creative Commons Attribution-NonCommercial-NoDerivs license
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Biomedical Informatics The use of informatics for the improvement of biomedical
research and clinical care
Shortliffe & Blois “The Computer Meets Medicine andBiology: Emergence of a Discipline” in Shortliffe & Cimino, eds., “Biomedical Informatics: Computer Applications in Health Care and Biomedicine
Friedman “A "fundamental theorem" of biomedical informatics. JAMIA 2009
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
http://www.ncats.nih.gov/research/cts/cts.html
What is “Translational” research?● My research is translational...
● ..because I want to get funding?
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Co-Clinical TrialsNardella, Lunardi, Pataik, and Cantley The APL Paradigm and the “Co-Clinical Trial” Project Cancer Discovery 2011● Acute Proyelocytic Leukemia
● 13 years of work – cloning, description, transgenics, trials
● Successful therapy now approved for clinical use..
● Co-clinical trials – synchronize human and animal trials.
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Measuring Success
● How is success of translational science measured?
● Cures? Therapies?
● Citations mapping model organism studies to T2, T3, T4 success?
● How do we assess impact that doesn't play out in the usual 3-5 year grant cycle?
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
FaceBasehttp://www.facebase.org
National Institute of Dental and Craniofacial Research
Five-year initial phase 2009-2014
“..systematically compile the biological instructions to construct the middle region of the human face and precisely define the genetics underlying its common developmental disorders, such as cleft lip and palate”
10 Projects: U-flavor contracts
Data Management and Coordination Hub
– “One-stop access to craniofacial research data”
– “Allow scientists to more rapidly and effectively generate hypotheses and accelerate the pace of their research”.
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
FaceBase Projects Hochheiser, et al. 2011
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
FaceBase Data
● 10 projects
● 4 organisms: mouse, zebrafish, chick, human
● Varying developmental time points
● Data types● Expression: microarray, ChIP-Seq, miRNA, ● Images: 3D MRI, OPT, microMRI, microCT, 3D human facial
mesh● Sequences: Genotypes.GWAS, CNV● Software: 3D facial image analysis● Strains: Cre recobminase drivers
● 1 Data management and coordination hub
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
What does it mean to effectively share data?
Multiple data types
– Modalities
– Organisms
– Protocols
Multiple Groups
– Some collaborative, others...
.
Diversity of data
Diversity of projects, goals, etc.
Challenges are both technical and social/organizational
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
FaceBase Hub Technical Challenges
Data DiversitySequences
microarraysmiRNAImages..
Data Volume Genotypes RNA-Seq
Anatomy Phenotype
Integration UCSC Genome Browser Between datasets
“Controlled Access” Human Data Genotypes Facial Images
Tools Searching Browsing
Metadata
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Metadata: Theory
● Well-defined metadata fields/attributes
● Controlled vocabularies provide consistent terminology for each field
● Link to appropriate resources as needed: NCBI, MGI, etc..
● Additional attributes specific to each data type
● Consistent metadata supports search, navigation...
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Metadata in practice
“What's the difference between mutation and Genotype?”
“Uhh. I'll have to get back to you on that.”
“Strain name? Is “C57B6J” the same as “C57Bl/6J?”
“We're lucky to have any information at all when it comes to these mice...”
“The official name of the strain is ____, but everybody calls it ____”
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Biomedical Ontologies might solvesome terminology and structure problems
Human Anatomy: EHDAA, FMA
Mouse Anatomy: EMAP, MA
Mouse/Human Phenotypes:
MP/HPO
Genes: GO
Data Models: MIAME, MiXX,
ISA-TAB, OBI
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Metadata Possibilties: Ontologies & Data Models
Ontologies, Data models, etc. User Practice
Grand Canyon NPS http://www.fotopedia.com/items/fickr-7553734530
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Implications
Good metadata is vital for search, retrieval, data sharing, reuse
Curation effort can compensate for some inadequacies, but
– EC
Curation Effort, Mq Metadata Quality
– EC = f( 1/M
q)
– Possibly non-linear
–
This doesn't scale
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Search tools
● Search + Link: current practice in bioinformatics repositories
● Text search over metadata● Linear results list● Results pages with many links
● Advantages● Easy, fast
● Disadvantages● Relationships between items in result list are not clear● No overviews to help with higher-level meta-understanding
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
The Ontology of Craniofacial Development and MalformationExtend existing ontologies to provide richer models of
craniofacial anatomy, development, and malformation
Foundational model of Anatomy (FMA), Human Phenotype Ontology (HPO), Mouse Anatomy (MA)..
Add human developmental description
Human-Mouse links
https://www.facebase.org/content/ocdm
But... better models don't guarantee usage
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
The tool gap
“Tools have advanced to the point of being able to support users fairly successfully in finding and reading off data (e.g. to classify and find multidimensional relationships of interest) but not in being able to interactively explore these complex relationships in context to infer causal explanations and build convincing biological stories amid uncertainty.”
B. Mirel Supporting cognition in systems biology analysis: findings on users' processes and design implications (2009) J. Biomedical Discovery & Collaboration
• Need: Tools that effectively leverage combination of human intellect and computing power
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Information Visualization● Interactive displays of high-dimensional data sets
● Coordinated views facilitate comparison across dimensions and
● Multi-scale
● Overview● Zoom/filter● Full details on requested items
● Rapid, incremental, reversible queries
● Avoid 0-hit or million-hit queries.
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Toward integrative tools
Human Mouse Zebrafish
Sequence
Expression
Images
Morphometry
Genotypes
How do we build connections?
How do we get users to think differently about the data?
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Information Visualization● Interactive displays of high-dimensional data sets
● Coordinated views facilitate comparison across dimensions and
● Multi-scale
● Overview● Zoom/filter● Full details on requested items
● Rapid, incremental, reversible queries
● Avoid 0-hit or million-hit queries.
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Technology Probe: Connecting related data items
• Single search, multiple data types
• Links between items indicate connections
• Images or other details on demand for direct access to data
• Rapid response supports interactivity, exploration, serendipitous discovery
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Technology Probe: Gene Atlas
• Genes displayed on timeline, indicating when active
• Images display expression localization
• Large image for detailed views.
• Mouse-over coordinated links
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Vision vs. reality
● Data is still coming in
● Gene atlas requires coordinated analysis ..
● What can we do with data that is available?
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Technology Probe: Gene Atlas
• Genes displayed on timeline, indicating when active
• Images display expression localization
• Large image for detailed views.
• Mouse-over coordinated links
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Current Search Interface
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Timeline viewApproximate alignment of datasets on common timeline
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
3D Facial Meshes
Two datasets
– Weinberg & Marazita: Caucasian facial norms
– Spritz: Tanzanian
Identifiable data -access only upon DAC approval
Query tool: identify subsets for further analysis.
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
UCSC Genome Browser Mirror
●Genomebrowser.facebase.org
● > 30 tracks, mouse mm9
● RNA-seq● ChIP-seq● Various anatomic
locations, time points, etc.
● Human CNV hg19
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Imaging
● microMRI,
● MicroCT
● OPT
● Interactive Browsers?
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Fishface
Atlas of zebrafish craniofacial development
C. Kimmel, et al.
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Current Status
As of 4 February 2013
> 200 datasets
> 100 in queue awaiting curation or investigator approval
Continuing to refine metadata and data organization
Expecting large infux of data over the next year.
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Toward Greater Integration
Data
Currently
Data
Metadata Metadata
Links via metadata: Comparable organism, stage, Gene, etc..
Ideally
Data Data
Metadata Metadata
Links via data:Commonly expressed genes,Sequences,Pathways
“Easy” for similar dataExpression, Sequences
Harder for diverse data
Images?
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
FaceBase as a model
● Diverse researchers and projects
● Focus: specific clinical domain
● Goal: collect and coordinated data – community resource
● Outstanding challenges?
● Is this even the right idea?
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Build it and they (may) come?
●Speculative claim: assemble good data and it will be of use
●How to evaluate this?
●How to compare to n R01s that might otherwise have been funded? (question asked by program officer)
● ENCODE controversy
“..the lesson I learned from ENCODE is that projects like ENCODE are not a good idea.”
M. Eisen http://www.michaeleisen.org/blog/?p=1179
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Three key factors
Tools Incentives
Outreach
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Desperately needed!
Search tools – finding data
Annotation Tools – describing data
– Using ontologies
– Common data models
Tools
Minimize the effort required to provide the high-quality metadata needed for translational bioinformatics.
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Perhaps the worst bioinformatics data management tool
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Perhaps the best bioinformatics data management tool
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Pros and Cons of Spreadsheets
Cons:
– ad-hoc data models
– ad-hoc data fields
– No consistency, provenance...
Pros:
– Ubiquitous
– Simple
– Flexible
– Familiar
Analytic tools: more rigor, less generality
Alternatives:
Experiment Metadata
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Usability and OntologiesCan we at least get the terms right?
Maguire et al. 2012
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Beyond Autocomplete
● Substring autocomplete – state of art for finding terms in biomedical ontologies
● Cons: no context, room for specialization or generalization?
● Can we build a tool that will easily support both search and navigation?
Foundational Model Explorerhttp://fme.biostr.washington.edu/FME/index.html
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Why not let the curators do it?
● Expense
● Accuracy
● Speed
● Scientists know their data best
Claim – To be sustainable, data sharing must be self-curated.
But.. why should anyone bother?
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Scientific incentives...
Research Results
PapersFunding
Traditionally.... Where doesData sharing fit in?
Data Sharing
?
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
How to promote sharing..
1. Better tools → reduced effort
Eannotation
< Ecollection
+ epsilon
2. Recognize effort: alternative models of academic credit
ImpactStory...
Altmetrics: Value all research products (H. Piwowar, Nature 1/9/13, doi:10.1038/493159a)
“For all new grant applications from 14 January, the US National Science Foundation (NSF) asks a principal investigator to list his or her research “products” rather than “publications” in the biographical sketch section.”
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Related Efforts
● Genomics Research in Alpha-1 Antitrypsin Deficiency and Sarcoidosis (GRADS) – microbiome, expression, phenotype sharing
● LAMHDI/MONARCH – User tools for ontologically-driven discovery of model genes for human diseases
● Research Social Network Collaboration Finding tool – using VIVO and related networking tools to find collaborators..
Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu
Acknowledgments
FaceBase: Mary Marazita, Jeff Murray, Yang Chai, Bruce Arnonow, Kristin Artinger, Teri Beaty, Jim Brinkley, David Clouthier, Michael Cunningham, Michael Dixon, Leah Rae Donahue, Scott E. Fraser, Benedikt Hallgrimsson, Junichi Iwata, Ophir Klein, Stephen Murray, Fernando Pardo-Manuel de Villena, John Postlethwait, Steven Potter, Lina Shapiro, Richard Spritz, Axel Visel, Seth Weinberg, Paul Trainor
U. Pittsburgh: Mike Becich, Becky Boes, Chuck Borromeo, Lance Kennelty, Annette Krag-Jensen, Tom Maher, Johnson Paul, Linda Schmandt, Shiyi Shen, Bill Shirey, Cristy Spino, Mike Stefanko, Justin Stickel
OCDM :James Brinkley; Jose Leonardo Mejino; Landon Detwiler; Ravensara Travillian; Melissa Clarkson; Timothy Cox; Carrie Heike; Michael Cunningham; Linda Shapiro
NIDCR: Steven Scholnick, Emily Harris, Jeannine Helm, Nadya Lumelsky, Lillian Shum
Support: NIH Grants U01 DE020057, 3U01DE020050-03S1