Adventures in Translational Bioinformatics

Post on 29-Nov-2014

323 views 0 download

description

 

Transcript of Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Adventures in Translational Bioinformatics

Harry Hochheiser

University of Pittsburgh School of Medicine

Department of Biomedical Informaticsharryh@pitt.edu

© 2010 Harry HochheiserLicensed under Creative Commons Attribution-NonCommercial-NoDerivs license

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Biomedical Informatics The use of informatics for the improvement of biomedical

research and clinical care

Shortliffe & Blois “The Computer Meets Medicine andBiology: Emergence of a Discipline” in Shortliffe & Cimino, eds., “Biomedical Informatics: Computer Applications in Health Care and Biomedicine

Friedman “A "fundamental theorem" of biomedical informatics. JAMIA 2009

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

http://www.ncats.nih.gov/research/cts/cts.html

What is “Translational” research?● My research is translational...

● ..because I want to get funding?

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Co-Clinical TrialsNardella, Lunardi, Pataik, and Cantley The APL Paradigm and the “Co-Clinical Trial” Project Cancer Discovery 2011● Acute Proyelocytic Leukemia

● 13 years of work – cloning, description, transgenics, trials

● Successful therapy now approved for clinical use..

● Co-clinical trials – synchronize human and animal trials.

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Measuring Success

● How is success of translational science measured?

● Cures? Therapies?

● Citations mapping model organism studies to T2, T3, T4 success?

● How do we assess impact that doesn't play out in the usual 3-5 year grant cycle?

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

FaceBasehttp://www.facebase.org

National Institute of Dental and Craniofacial Research

Five-year initial phase 2009-2014

“..systematically compile the biological instructions to construct the middle region of the human face and precisely define the genetics underlying its common developmental disorders, such as cleft lip and palate”

10 Projects: U-flavor contracts

Data Management and Coordination Hub

– “One-stop access to craniofacial research data”

– “Allow scientists to more rapidly and effectively generate hypotheses and accelerate the pace of their research”.

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

FaceBase Projects Hochheiser, et al. 2011

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

FaceBase Data

● 10 projects

● 4 organisms: mouse, zebrafish, chick, human

● Varying developmental time points

● Data types● Expression: microarray, ChIP-Seq, miRNA, ● Images: 3D MRI, OPT, microMRI, microCT, 3D human facial

mesh● Sequences: Genotypes.GWAS, CNV● Software: 3D facial image analysis● Strains: Cre recobminase drivers

● 1 Data management and coordination hub

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

What does it mean to effectively share data?

Multiple data types

– Modalities

– Organisms

– Protocols

Multiple Groups

– Some collaborative, others...

.

Diversity of data

Diversity of projects, goals, etc.

Challenges are both technical and social/organizational

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

FaceBase Hub Technical Challenges

Data DiversitySequences

microarraysmiRNAImages..

Data Volume Genotypes RNA-Seq

Anatomy Phenotype

Integration UCSC Genome Browser Between datasets

“Controlled Access” Human Data Genotypes Facial Images

Tools Searching Browsing

Metadata

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Metadata: Theory

● Well-defined metadata fields/attributes

● Controlled vocabularies provide consistent terminology for each field

● Link to appropriate resources as needed: NCBI, MGI, etc..

● Additional attributes specific to each data type

● Consistent metadata supports search, navigation...

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Metadata in practice

“What's the difference between mutation and Genotype?”

“Uhh. I'll have to get back to you on that.”

“Strain name? Is “C57B6J” the same as “C57Bl/6J?”

“We're lucky to have any information at all when it comes to these mice...”

“The official name of the strain is ____, but everybody calls it ____”

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Biomedical Ontologies might solvesome terminology and structure problems

Human Anatomy: EHDAA, FMA

Mouse Anatomy: EMAP, MA

Mouse/Human Phenotypes:

MP/HPO

Genes: GO

Data Models: MIAME, MiXX,

ISA-TAB, OBI

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Metadata Possibilties: Ontologies & Data Models

Ontologies, Data models, etc. User Practice

Grand Canyon NPS http://www.fotopedia.com/items/fickr-7553734530

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Implications

Good metadata is vital for search, retrieval, data sharing, reuse

Curation effort can compensate for some inadequacies, but

– EC

Curation Effort, Mq Metadata Quality

– EC = f( 1/M

q)

– Possibly non-linear

This doesn't scale

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Search tools

● Search + Link: current practice in bioinformatics repositories

● Text search over metadata● Linear results list● Results pages with many links

● Advantages● Easy, fast

● Disadvantages● Relationships between items in result list are not clear● No overviews to help with higher-level meta-understanding

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

The Ontology of Craniofacial Development and MalformationExtend existing ontologies to provide richer models of

craniofacial anatomy, development, and malformation

Foundational model of Anatomy (FMA), Human Phenotype Ontology (HPO), Mouse Anatomy (MA)..

Add human developmental description

Human-Mouse links

https://www.facebase.org/content/ocdm

But... better models don't guarantee usage

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

The tool gap

“Tools have advanced to the point of being able to support users fairly successfully in finding and reading off data (e.g. to classify and find multidimensional relationships of interest) but not in being able to interactively explore these complex relationships in context to infer causal explanations and build convincing biological stories amid uncertainty.”

B. Mirel Supporting cognition in systems biology analysis: findings on users' processes and design implications (2009) J. Biomedical Discovery & Collaboration

• Need: Tools that effectively leverage combination of human intellect and computing power

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Information Visualization● Interactive displays of high-dimensional data sets

● Coordinated views facilitate comparison across dimensions and

● Multi-scale

● Overview● Zoom/filter● Full details on requested items

● Rapid, incremental, reversible queries

● Avoid 0-hit or million-hit queries.

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Toward integrative tools

Human Mouse Zebrafish

Sequence

Expression

Images

Morphometry

Genotypes

How do we build connections?

How do we get users to think differently about the data?

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Information Visualization● Interactive displays of high-dimensional data sets

● Coordinated views facilitate comparison across dimensions and

● Multi-scale

● Overview● Zoom/filter● Full details on requested items

● Rapid, incremental, reversible queries

● Avoid 0-hit or million-hit queries.

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Technology Probe: Connecting related data items

• Single search, multiple data types

• Links between items indicate connections

• Images or other details on demand for direct access to data

• Rapid response supports interactivity, exploration, serendipitous discovery

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Technology Probe: Gene Atlas

• Genes displayed on timeline, indicating when active

• Images display expression localization

• Large image for detailed views.

• Mouse-over coordinated links

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Vision vs. reality

● Data is still coming in

● Gene atlas requires coordinated analysis ..

● What can we do with data that is available?

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Technology Probe: Gene Atlas

• Genes displayed on timeline, indicating when active

• Images display expression localization

• Large image for detailed views.

• Mouse-over coordinated links

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Current Search Interface

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Timeline viewApproximate alignment of datasets on common timeline

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

3D Facial Meshes

Two datasets

– Weinberg & Marazita: Caucasian facial norms

– Spritz: Tanzanian

Identifiable data -access only upon DAC approval

Query tool: identify subsets for further analysis.

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

UCSC Genome Browser Mirror

●Genomebrowser.facebase.org

● > 30 tracks, mouse mm9

● RNA-seq● ChIP-seq● Various anatomic

locations, time points, etc.

● Human CNV hg19

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Imaging

● microMRI,

● MicroCT

● OPT

● Interactive Browsers?

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Fishface

Atlas of zebrafish craniofacial development

C. Kimmel, et al.

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Current Status

As of 4 February 2013

> 200 datasets

> 100 in queue awaiting curation or investigator approval

Continuing to refine metadata and data organization

Expecting large infux of data over the next year.

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Toward Greater Integration

Data

Currently

Data

Metadata Metadata

Links via metadata: Comparable organism, stage, Gene, etc..

Ideally

Data Data

Metadata Metadata

Links via data:Commonly expressed genes,Sequences,Pathways

“Easy” for similar dataExpression, Sequences

Harder for diverse data

Images?

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

FaceBase as a model

● Diverse researchers and projects

● Focus: specific clinical domain

● Goal: collect and coordinated data – community resource

● Outstanding challenges?

● Is this even the right idea?

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Build it and they (may) come?

●Speculative claim: assemble good data and it will be of use

●How to evaluate this?

●How to compare to n R01s that might otherwise have been funded? (question asked by program officer)

● ENCODE controversy

“..the lesson I learned from ENCODE is that projects like ENCODE are not a good idea.”

M. Eisen http://www.michaeleisen.org/blog/?p=1179

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Three key factors

Tools Incentives

Outreach

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Desperately needed!

Search tools – finding data

Annotation Tools – describing data

– Using ontologies

– Common data models

Tools

Minimize the effort required to provide the high-quality metadata needed for translational bioinformatics.

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Perhaps the worst bioinformatics data management tool

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Perhaps the best bioinformatics data management tool

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Pros and Cons of Spreadsheets

Cons:

– ad-hoc data models

– ad-hoc data fields

– No consistency, provenance...

Pros:

– Ubiquitous

– Simple

– Flexible

– Familiar

Analytic tools: more rigor, less generality

Alternatives:

Experiment Metadata

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Usability and OntologiesCan we at least get the terms right?

Maguire et al. 2012

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Beyond Autocomplete

● Substring autocomplete – state of art for finding terms in biomedical ontologies

● Cons: no context, room for specialization or generalization?

● Can we build a tool that will easily support both search and navigation?

Foundational Model Explorerhttp://fme.biostr.washington.edu/FME/index.html

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Why not let the curators do it?

● Expense

● Accuracy

● Speed

● Scientists know their data best

Claim – To be sustainable, data sharing must be self-curated.

But.. why should anyone bother?

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Scientific incentives...

Research Results

PapersFunding

Traditionally.... Where doesData sharing fit in?

Data Sharing

?

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

How to promote sharing..

1. Better tools → reduced effort

Eannotation

< Ecollection

+ epsilon

2. Recognize effort: alternative models of academic credit

ImpactStory...

Altmetrics: Value all research products (H. Piwowar, Nature 1/9/13, doi:10.1038/493159a)

“For all new grant applications from 14 January, the US National Science Foundation (NSF) asks a principal investigator to list his or her research “products” rather than “publications” in the biographical sketch section.”

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Related Efforts

● Genomics Research in Alpha-1 Antitrypsin Deficiency and Sarcoidosis (GRADS) – microbiome, expression, phenotype sharing

● LAMHDI/MONARCH – User tools for ontologically-driven discovery of model genes for human diseases

● Research Social Network Collaboration Finding tool – using VIVO and related networking tools to find collaborators..

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, harryh@pitt.edu

Acknowledgments

FaceBase: Mary Marazita, Jeff Murray, Yang Chai, Bruce Arnonow, Kristin Artinger, Teri Beaty, Jim Brinkley, David Clouthier, Michael Cunningham, Michael Dixon, Leah Rae Donahue, Scott E. Fraser, Benedikt Hallgrimsson, Junichi Iwata, Ophir Klein, Stephen Murray, Fernando Pardo-Manuel de Villena, John Postlethwait, Steven Potter, Lina Shapiro, Richard Spritz, Axel Visel, Seth Weinberg, Paul Trainor

U. Pittsburgh: Mike Becich, Becky Boes, Chuck Borromeo, Lance Kennelty, Annette Krag-Jensen, Tom Maher, Johnson Paul, Linda Schmandt, Shiyi Shen, Bill Shirey, Cristy Spino, Mike Stefanko, Justin Stickel

OCDM :James Brinkley; Jose Leonardo Mejino; Landon Detwiler; Ravensara Travillian; Melissa Clarkson; Timothy Cox; Carrie Heike; Michael Cunningham; Linda Shapiro

NIDCR: Steven Scholnick, Emily Harris, Jeannine Helm, Nadya Lumelsky, Lillian Shum

Support: NIH Grants U01 DE020057, 3U01DE020050-03S1