Excellence in Computational Biology and Informatics · Excellence in Computational Biology and...
Transcript of Excellence in Computational Biology and Informatics · Excellence in Computational Biology and...
![Page 1: Excellence in Computational Biology and Informatics · Excellence in Computational Biology and Informatics Daniel Crichton Dan.Crichton@jpl.nasa.gov ... May 2013 at Caltech to address](https://reader034.fdocuments.in/reader034/viewer/2022052519/5f0f4c087e708231d4437624/html5/thumbnails/1.jpg)
Excellence in Computational Biology and Informatics
Daniel Crichton [email protected]
Principal Computer Scientist and Program Manager Director, Center for Data Science and Technology
Principal Investigator, NCI Early Detection Research Network Informatics Center NASA Jet Propulsion Laboratory, California Institute of Technology
![Page 2: Excellence in Computational Biology and Informatics · Excellence in Computational Biology and Informatics Daniel Crichton Dan.Crichton@jpl.nasa.gov ... May 2013 at Caltech to address](https://reader034.fdocuments.in/reader034/viewer/2022052519/5f0f4c087e708231d4437624/html5/thumbnails/2.jpg)
EDRN Informatics
• NCI/JPL partnered since 2001 to develop a long term distributed knowledge system for the EDRN
• Significantly leveraged the NASA model – Implemented Apache OODT from JPL – Architecture and approach – Open Source, Data Intensive Science approach – 2011 NASA Award for the accomplishment
• Supports capture and access to a diverse
collection of distributed sets of information and results
– Biomarkers – Biospecimens – Scientific Data Sets – Protocols – Etc
2
Integrated knowledge environment
!!Access!to!science!data!sets!!
Access!to!!specimen!informa0on!
Access!to!biomarker!data!!and!results!
Access!to!study!data!
http://cancer.gov/edrn (operational) http://edrn.jpl.nasa.gov (beta; emerging capabilities)
![Page 3: Excellence in Computational Biology and Informatics · Excellence in Computational Biology and Informatics Daniel Crichton Dan.Crichton@jpl.nasa.gov ... May 2013 at Caltech to address](https://reader034.fdocuments.in/reader034/viewer/2022052519/5f0f4c087e708231d4437624/html5/thumbnails/3.jpg)
Supporting the Science Data Lifecycle
• Ingestion of data: Steps for transformation and validation including curation and peer review of the data
• Cataloging of Structured and Unstructured Data: Separation of the description (catalog) of data from the physical data storage
• Data Processing: Highly validated, scalable pipelines and jobs for remote sensing instruments; versioning of algorithms and data; this can be done by distributed teams prior to submitting to national archives
• Data Management: Construction and management of metadata catalogs and data (often distributed); capture of raw and processed data.
• Data Discovery: Discovery of data for scientific research • Data Access: Access to the scientific data • Data Distribution, Computation and Analysis: Support for analysis
and services (e.g., subsetting) on the data; move towards automated data discovery
3
![Page 4: Excellence in Computational Biology and Informatics · Excellence in Computational Biology and Informatics Daniel Crichton Dan.Crichton@jpl.nasa.gov ... May 2013 at Caltech to address](https://reader034.fdocuments.in/reader034/viewer/2022052519/5f0f4c087e708231d4437624/html5/thumbnails/4.jpg)
4
Capture of Public Science Data
4
Instrument Operations Science
Data Processing
Data Distribution
(EDRN Public Portal)
EDRN Bioinformatics Tools
Instrument eCAS - EDRN Biorepository
External Science
Community
EDRN Researchers
Laboratory Biorepository
Analysis Team
Local Laboratory Science Data System
Published Results
• Biomarkers • Protocols • Science Data • Publications
Comprehensive curation tools in place
An Integrated Repository of Public Data Sets
![Page 5: Excellence in Computational Biology and Informatics · Excellence in Computational Biology and Informatics Daniel Crichton Dan.Crichton@jpl.nasa.gov ... May 2013 at Caltech to address](https://reader034.fdocuments.in/reader034/viewer/2022052519/5f0f4c087e708231d4437624/html5/thumbnails/5.jpg)
A Virtual, National Integration Biomarkers Knowledge System
5
![Page 6: Excellence in Computational Biology and Informatics · Excellence in Computational Biology and Informatics Daniel Crichton Dan.Crichton@jpl.nasa.gov ... May 2013 at Caltech to address](https://reader034.fdocuments.in/reader034/viewer/2022052519/5f0f4c087e708231d4437624/html5/thumbnails/6.jpg)
Biomarker Knowledge System: An integrated semantic architecture
Biomarker Annotations Protocols Biomarker Data Results
Linked through Public Portal Access to download data Specimens
![Page 7: Excellence in Computational Biology and Informatics · Excellence in Computational Biology and Informatics Daniel Crichton Dan.Crichton@jpl.nasa.gov ... May 2013 at Caltech to address](https://reader034.fdocuments.in/reader034/viewer/2022052519/5f0f4c087e708231d4437624/html5/thumbnails/7.jpg)
Cancer Biomarker Bioinformatics Workshop
• The EDRN and NASA Jet Propulsion Laboratory held a workshop in May 2013 at Caltech to address informatics and data-driven research in cancer biomarkers – http://edrn.nci.nih.gov/cancer-bioinformatics-workshop/cancer-biomarker-
bioinformatics-workshop-report-may-2013 – A major outcome focused on data usability, reproducibility of results,
methods and algorithms to systematize data analysis, and scalable computing infrastructures.
• Key Recommendations – Systematic approaches to the generation, capture, management of data
to enable reproducibility. – Increased emphasis on data curation to promote data reuse – Automation of data process/analytics software pipelines – Data integration and fusion of data from multiple platforms, studies – Scalable data infrastructures and repositories – Use of big data tools and bioinformatics techniques to scale data analysis – Increased training of scientists in the use of computational tools/methods
![Page 8: Excellence in Computational Biology and Informatics · Excellence in Computational Biology and Informatics Daniel Crichton Dan.Crichton@jpl.nasa.gov ... May 2013 at Caltech to address](https://reader034.fdocuments.in/reader034/viewer/2022052519/5f0f4c087e708231d4437624/html5/thumbnails/8.jpg)
8
Moving towards data-driven science for cancer biomarkers
Instrument Operations Science
Data Processing
Data Distribution
Bioinformatics Tools
Instrument Public Biorepository
External Science
Community
Bioinformatics Community
Laboratory Biorepository
Analysis Team
Publish Data Sets
• Automated pipelines • Complex Workflows • Scalable Computational Algorithms • (genomic, proteomic) • Automated feature detection • Automated curation
• Scalable Computational Biology Infrastructures (cloud, HPC, etc)
• Local algorithms processing
• On-demand algorithms • Algorithms • Data fusion methods • Machine learning techniques
Results
-Cross-cutting Data/Information Architectures -
“LabCAS”
“eCAS”
![Page 9: Excellence in Computational Biology and Informatics · Excellence in Computational Biology and Informatics Daniel Crichton Dan.Crichton@jpl.nasa.gov ... May 2013 at Caltech to address](https://reader034.fdocuments.in/reader034/viewer/2022052519/5f0f4c087e708231d4437624/html5/thumbnails/9.jpg)
Application of Machine Learning Techniques
150 000 pixel
original
classification
Automated Classification
TMA Estimator TMA Annotator TMA Classifier
Estimate the Staining on a whole spot
Detect nuclei on a whole spot
Classify single nuclei into tumor, non-tumor and stained, not-stained
Original Image Discriminative Object Detection
Generative P. Process Fitting
Feature/Object Detection
Volcanoes on Venus
![Page 10: Excellence in Computational Biology and Informatics · Excellence in Computational Biology and Informatics Daniel Crichton Dan.Crichton@jpl.nasa.gov ... May 2013 at Caltech to address](https://reader034.fdocuments.in/reader034/viewer/2022052519/5f0f4c087e708231d4437624/html5/thumbnails/10.jpg)
Today
• Good opportunities to look at collaborations around data-driven computational science approaches – Excellent speakers
• Recommend those that are interested to check out the Caltech/JPL Virtual Summer School on Big Data Analytics through Coursera or on the Caltech website – Started Sep 2, 2014 – 1500 people signed up to watch
• I hope you enjoy the session!
![Page 11: Excellence in Computational Biology and Informatics · Excellence in Computational Biology and Informatics Daniel Crichton Dan.Crichton@jpl.nasa.gov ... May 2013 at Caltech to address](https://reader034.fdocuments.in/reader034/viewer/2022052519/5f0f4c087e708231d4437624/html5/thumbnails/11.jpg)
Backup
11/13/14 11
![Page 12: Excellence in Computational Biology and Informatics · Excellence in Computational Biology and Informatics Daniel Crichton Dan.Crichton@jpl.nasa.gov ... May 2013 at Caltech to address](https://reader034.fdocuments.in/reader034/viewer/2022052519/5f0f4c087e708231d4437624/html5/thumbnails/12.jpg)
National Research Council: Frontiers in Massive Data Analsyis
• Chartered in 2010 by the National Research Council
• Chaired by Michael Jordan, Berkeley, AMP Lab (Algorithms, Machines, People)
• Importance of systematizing the analysis of data
• Need for end-to-end approaches to data analysis
• Integration of multiple disciplines • Application of novel statistical and
machine learning approaches for data discovery
• The movement from computation-intensive to data-intensive
Published Sept. 2013