Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical...
Transcript of Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical...
![Page 1: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/1.jpg)
Center for Causal Discovery (CCD)of Biomedical Knowledge from Big Data
University of Pittsburgh
Carnegie Mellon University
Pittsburgh Supercomputing Center
Yale University
PIs: Ivet Bahar, Jeremy Berg, Greg Cooper
![Page 2: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/2.jpg)
Outline
• The U.S. NIH big data to knowledge (BD2K) initiative
• Why focus on the discovery of causal knowledge from big biomedical data?
• Why establish a Center for Causal Discovery (CCD)?
• What are some basic methods being used by CCD?
• What are the goals of the CCD?
![Page 3: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/3.jpg)
NIH Big Data to Knowledge (BD2K) Initiative
For more information, see: https://datascience.nih.gov/bd2k/
The ability to harvest the wealth of information contained in biomedical Big Data will advance our understanding of human health and disease; however, lack of appropriate tools, poor data accessibility, and insufficient training, are major impediments to rapid translational impact.
To meet this challenge, the U.S. National Institutes of Health (NIH) launched the Big Data to Knowledge (BD2K) initiative in 2012.
BD2K is a trans-NIH initiative with the following major aims:• Facilitate broad use of biomedical digital assets• Conduct research and develop the methods, software, and tools
needed to analyze biomedical Big Data• Enhance training in the development and use of methods and
tools necessary for biomedical Big Data science.• Support a data ecosystem that accelerates biomedical knowledge
discovery
![Page 4: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/4.jpg)
NIH BD2K Centers of Excellence
• The Centers of Excellence are part of the overall NIH BD2K initiative.
• The goal is to develop and disseminate computational methods to assist biomedical researchers in using big data to significantly advance biomedical science.
• Project components include research, software development and dissemination, training, and joint Center activities.
• As of September 2014, NIH began funding 11 BD2K Centers of Excellence.
• Funding is for 4 years.
• More information is available at: https://datascience.nih.gov/bd2k/funded-programs/centers
![Page 5: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/5.jpg)
Causal Discovery in Biomedicine
Science is centrally concerned with the discovery of causal relationships in nature.
• Understanding
• Prediction
• Control
Examples:
• Determine the genes and cell signaling pathways that cause breast cancer
• Discover the clinical effects of a new drug
• Uncover the mechanisms of pathogenicity of a recently mutated virus that is spreading rapidly in the population
![Page 6: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/6.jpg)
Why Establish a Center for Causal Discovery Now?
• Algorithmic Advances
+
• Availability of Big Biomedical Data
![Page 7: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/7.jpg)
Algorithmic Advances
• In the past 25 years, there has been tremendous progress in the development of computational methods for representing and discovering causal networks from a combination of data and knowledge.
• These methods are often applicable to biomedical data.
![Page 8: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/8.jpg)
Availability of Big Biomedical Data
• The variety, richness, and quantity of biomedical data have been increasing very rapidly.• High-throughput molecular data (e.g., whole-genome sequencing)
• Clinical EMR data
• Population health data from social media and mobile sensors
• The appropriate analysis of these data has great potential to advance biomedical science.
http://aldousvoice.files.wordpress.com/2014/06/database.jpg
![Page 9: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/9.jpg)
The Time Seems Right to Disseminate These Algorithms to Scientists to Use in Analyzing Biomedical Data
for Causal Relationships
Big Biomedical Data
Causal Discovery Algorithms
Causal Networks
![Page 10: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/10.jpg)
Basic Causal Discovery Workflow
Causal Networks
Prior Knowledge
Causal
AnalysisData
![Page 11: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/11.jpg)
Basic Causal Discovery Workflow
Causal Networks
Prior Knowledge
Causal
AnalysisData
Both observational
and experimental
data
![Page 12: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/12.jpg)
Basic Causal Discovery Workflow
Causal Networks
Prior Knowledge
Causal
AnalysisData
![Page 13: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/13.jpg)
Basic Causal Discovery Workflow
Causal Networks
Prior Knowledge
Causal
Analysis
Causal Hypotheses
Data
![Page 14: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/14.jpg)
Basic Causal Discovery Workflow
Causal Networks
Prior Knowledge
Causal
Analysis
Causal HypothesesExperiments
Data
![Page 15: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/15.jpg)
Basic Causal Discovery Workflow
Causal Networks
Prior Knowledge
Causal
Analysis
Causal HypothesesExperiments
Data
![Page 16: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/16.jpg)
Basic Causal Discovery Workflow
Causal Networks
Prior Knowledge
Causal
Analysis
Causal HypothesesExperiments
Data
![Page 17: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/17.jpg)
An Example of Causal Network Discovery from Biomedical Data
Sachs K, et al. Protein-signaling networks learned from multi-parameter single-cell data of human T cells Science 308 (2005) 523-529.
![Page 18: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/18.jpg)
A Portion of a Cell Signaling Network(and Points of Experimental Intervention)
Sachs K, et al. Science 308 (2005) 523-529. (The figure above appears in this paper.)
![Page 19: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/19.jpg)
Overview of Experimental Design and Data Analysis
Sachs K, et al. Protein-signaling networks learned from multi-parameter single-cell data of human T cells Science 308 (2005) 523-529. (The figure above appears in this paper.)
![Page 20: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/20.jpg)
Results of Causal Network Analysis for the Example
Sachs K, et al. Protein-signaling networks learned from multi-parameter single-cell data of human T cells Science 308 (2005) 523-529. (The figure above appears in this paper.)
![Page 21: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/21.jpg)
Basic Components Needed to Learn Causal Networks from Data
• Model representation
• Model search
• Model evaluation
![Page 22: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/22.jpg)
Model Representation:Causal Bayesian Networks (CBNs)
• Nodes represent variables
• Arcs represent direct causation
• A directed acyclic graph
• A variable is modeled as independent of its non-effects, given its causal parents
Example:A B C
![Page 23: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/23.jpg)
Model Representation:Causal Bayesian Networks (CBNs)
• Nodes represent variables
• Arcs represent direct causation
• A directed acyclic graph
• A variable is modeled as independent of its non-effects, given its causal parents
Example:A B C
CBNstructure}
![Page 24: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/24.jpg)
Model Representation:Causal Bayesian Networks (CBNs)
• Nodes represent variables
• Arcs represent direct causation
• A directed acyclic graph
• A variable is modeled as independent of its non-effects, given its causal parents
Example:
• There is a factorization of the joint probability distribution
Example:
P(A, B, C) = P(A) P(B | A) P(C| B)
A B CCBNstructure}
CBNparameters}
![Page 25: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/25.jpg)
Model Search
• The space of CBNs is very large
• Heuristic search is generally applied in seeking to find the most likely CBNs
• We search for the most likely CBN structures
• Once a highly likely CBN structure is found, we can parameterize it using the data
• We can also model average over highly probable substructures (e.g., a causal arc from X to Y)
![Page 26: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/26.jpg)
Model SearchA B C
A B C
A B C
A B C
A B C
A B C
A B C
A B C
![Page 27: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/27.jpg)
Model Evaluation:Two Primary Approaches
1. Constraint based
2. Bayesian
![Page 28: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/28.jpg)
Model Evaluation:Two Primary Approaches
1. Constraint based
2. Bayesian
![Page 29: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/29.jpg)
Model Evaluation The Constraint-Based Approach
1. Determine constraints that hold among the nodes (e.g., independence conditions based on statistical tests)
2. Use the patterns of constraints to narrow the causal possibilities
![Page 30: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/30.jpg)
Constraint-Based Evaluation: An Example
Suppose in searching over CBNs we apply statistical tests to the observational data* on A, B, and C and obtain the following constraints:
• A dep B
• B dep C
• A dep C
Which of the following models is consistent with the above constraints?
A B C
A B C
* More generally, a combination of observational data, experimental data, and background knowledge can be provided as input.
![Page 31: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/31.jpg)
Constraint-Based Evaluation: An Example
Suppose in searching over CBNs we apply statistical tests to the observational data on A, B, and C and obtain the following constraints:
• A dep B
• B dep C
• A dep C
Which of the following models is consistent with those constraints?
A B C
![Page 32: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/32.jpg)
Several Key Causal Relationships
![Page 33: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/33.jpg)
Some Key Characteristics of Causal Discovery Problems
![Page 34: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/34.jpg)
Types of Big Data Problems Include …
• Volume of data
• Number of samples
• Number of variables per sample
• Variety of data – the different types of data
• Velocity of data – how fast the data are being generated
• Veracity of data – the uncertainty in the data (e.g., noise, biases)
![Page 35: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/35.jpg)
What is the Big Data Problem on which the CCD is Primarily Focused?
![Page 36: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/36.jpg)
Causal Network Discovery Methods Have Been Applied Successfully to Small Biomedical Datasets
Sachs K, et al. Protein-signaling networks learned from multi-parameter single-cell data of human T cells Science 308 (2005) 523-529. (The figure above appears in this paper.)
![Page 37: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/37.jpg)
Carro MS, et al. The transcriptional network for mesenchymal transformation of brain tumours. Nature
463 (2010) 318-325. . (The figure above appears in this paper.)
The Methods Have Also Been SuccessfullyApplied to Medium Sized Biomedical Datasets
![Page 38: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/38.jpg)
Most Algorithms Are Not Able to Handle Big Data Containing Many Thousands of Variables
Yang X, et al. Validation of candidate causal genes for obesity that affect shared metabolic pathways and networks. Nature Genetics 41 (2009) 415-423.
![Page 39: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/39.jpg)
The Number of Causal Models as a Function of the Number of Measured Variables*
Number of nodes Number of Causal Models
1 1
2 3
* Assumes there are no latent variables and no directed cycles.
![Page 40: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/40.jpg)
The Number of Causal Models as a Function of the Number of Measured Variables*
Number of nodes Number of Causal Models
1 1
2 3
3 25
4 543
* Assumes there are no latent variables and no directed cycles.
![Page 41: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/41.jpg)
The Number of Causal Models as a Function of the Number of Measured Variables*
Number of nodes Number of Causal Models
1 1
2 3
3 25
4 543
5 29,281
6 3,781,503
7 1.1 x 109
8 7.8 x 1011
9 1.2 x 1015
10 4.2 x 1018
* Assumes there are no latent variables and no directed cycles.
![Page 42: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/42.jpg)
Our Big Data Problem:
Analyze biomedical datasets containing a large number of variables in order to generate plausible hypotheses of the causal relationships that hold among those variables
![Page 43: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/43.jpg)
Big Data Problems Being Pursued in CCD
• Volume of data – the scale of the data
• Number of samples
• Number of variables per sample
• Variety of data – the different types of data
• Velocity of data – how fast the data are being generated
• Veracity of data – the uncertainty in the data (e.g., noise, biases)
![Page 44: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/44.jpg)
Big Data Problems Being Pursued in CCD
• Volume of data – the scale of the data
• Number of samples
• Number of variables per sample
• Variety of data – the different types of data
• Velocity of data – how fast the data are being generated
• Veracity of data – the uncertainty in the data (e.g., noise, biases)
![Page 45: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/45.jpg)
Some Recent Progress in Developing Highly Efficient Causal Discovery Algorithms
Recently we have optimized a popular causal discovery algorithm (GES) to be much more efficient than before (FastGES)
• Approach– Optimize the single processor version
– Parallelize the algorithm
![Page 46: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/46.jpg)
Preliminary Evaluation of FastGES
• Evaluation method– Simulated data from a linear Gaussian model
– Number node nodes (N) = number of edges
– Number of samples = 1000
• Results– N = 50,000 13 minutes on a laptop with 8 cores & 16 GB RAM
– N = 1,000,000 18 hours with 40 cores & 384 GB
• For more information: http://arxiv.org/ftp/arxiv/papers/1507/1507.07749.pdf
N AdjacencyTPR
AdjacencyTNR
Orientation TPR
Orientation TNR
50,000 99.3% 97.5% 98.2% 96.1%
1,000,000 99.9% 93.5% 99.9% 90.4%
![Page 47: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/47.jpg)
Primary Goals of the CCD
• Goal 1. Develop and implement state-of-the-art methods for causal modeling and discovery (CMD) of knowledge from biomedical big data– Make the best existing CMD methods available
– Develop new CMD methods
![Page 48: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/48.jpg)
Primary Goals of the CCD
• Goal 2. Investigate three biomedical projects– Evaluate the usefulness of CMD methods on these problems
– Drive further the development of the CMD methods
![Page 49: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/49.jpg)
Primary Goals of the CCD
• Goal 3. Disseminate CMD methods and knowledge widely to biomedical researchers and data scientists– Software
• Algorithms
– Implement a suite of causal discovery algorithms and make them available as application programming interfaces (APIs)
– Open source and free
• Desktop application: Develop an easy-to-use causal modeling and discovery (CMD) system with a desktop interface, which is open source and free
– Training
– Collaborative activities with other BD2K Centers
![Page 50: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/50.jpg)
Driving Biomedical Projects (DBPs)
• Discovery of cell signaling networks in cancer
• Discovery of the mechanisms of disease onset and progression in chronic obstructive pulmonary disease and idiopathic pulmonary fibrosis
• Discovery of the functional (causal) connectivity of regions of the human brain from fMRI data
![Page 51: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/51.jpg)
Cancer DBP: Goal 1• Develop methods to identify driver (disease causing)
somatic genomic alterations (SGAs) of tumors
• Big Data: The Cancer Genome Atlas (TCGA)
![Page 52: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/52.jpg)
Cancer DBP: Goal 1• Develop methods to identify driver (disease causing)
somatic genomic alterations (SGAs) of tumors
• Big Data: The Cancer Genome Atlas (TCGA)
Available Cancer Types # Cases with Data
Acute Myeloid Leukemia [LAML] 200
Adrenocortical carcinoma [ACC] 80
Bladder Urothelial Carcinoma [BLCA] 412
Brain Lower Grade Glioma [LGG] 516
Breast invasive carcinoma [BRCA] 1098
![Page 53: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/53.jpg)
Cancer DBP: Goal 1• Develop methods to identify driver (disease causing)
somatic genomic alterations (SGAs) of tumors
• Big Data: The Cancer Genome Atlas (TCGA)
Available Cancer Types # Cases with Data
Acute Myeloid Leukemia [LAML] 200
Adrenocortical carcinoma [ACC] 80
Bladder Urothelial Carcinoma [BLCA] 412
Brain Lower Grade Glioma [LGG] 516
Breast invasive carcinoma [BRCA] 1098
![Page 54: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/54.jpg)
Cancer DBP: Goal 1• Develop methods to identify driver (disease causing)
somatic genomic alterations (SGAs) of tumors
• Big Data: The Cancer Genome Atlas (TCGA)
![Page 55: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/55.jpg)
Cancer DBP: Goal 1• Develop methods to identify driver (disease causing)
somatic genomic alterations (SGAs) of tumors
• Big Data: The Cancer Genome Atlas (TCGA)
![Page 56: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/56.jpg)
Cancer DBP: Goal 1• Develop methods to identify driver (disease causing)
somatic genomic alterations (SGAs) of tumors
• Big Data: The Cancer Genome Atlas (TCGA)
• Methods: Search for somatic alterations (A) that the data support as causing changes in the cellular behavior of tumors (G)
![Page 57: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/57.jpg)
Cancer DBP: Goal 1• Develop methods to identify driver (disease causing)
somatic genomic alterations (SGAs) of tumors
• Big Data: The Cancer Genome Atlas (TCGA)
• Methods: Search for somatic alterations (A) that the data support as causing changes in the cellular behavior of tumors (G)
![Page 58: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/58.jpg)
Cancer DBP: Goal 1• Develop methods to identify driver (disease causing)
somatic genomic alterations (SGAs) of tumors
• Big Data: The Cancer Genome Atlas (TCGA)
• Methods: Search for somatic alterations (A) that the data support as causing changes in the cellular behavior of tumors (G)
• General findings:
• Found many known drivers of cancer
• Also found some mutations not known to be drivers of cancer that we plan to test experimentally
![Page 59: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/59.jpg)
apply population-widelearning method
inference
training set
patient case prediction
Population-Wide Modeling Approach
population-widemodel
![Page 60: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/60.jpg)
apply a personalizedlearning method
inference
training set
patient case prediction
Personalized Modeling Approach
personalizedmodel
![Page 61: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/61.jpg)
Summary• The NIH BD2K initiative is focused on developing ways
to enhance the translation of increasing amounts of digital data into biomedical knowledge.
• Causal relationships are a central type of biomedical knowledge.
• The Center for Causal Discovery (CCD) is focused on developing and making readily available algorithms and systems for generating plausible causal hypotheses from big biomedical data.
• The CCD is exploring three example biomedical problems and is investigating methods for personalized causal modeling of health and disease.
![Page 62: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/62.jpg)
Additional Information
Spirtes P, Glymour C, Scheines R, Tillman R. Automated search for causal relations: Theory and practice. In Heuristics, Probability, and Causality: A Tribute to Judea Pearl, edited by Rina Dechter, Hector Geffner, and Joseph Halpern (College Publications, 2010, Chapter 28, pages 467-506).http://repository.cmu.edu/cgi/viewcontent.cgi?article=1423&context=philosophy
Kalisch M, Buhlmann P. Causal structure learning and inference: A selective review.Quality Technology and Quantitative Management, 11 (2014) 3-21.http://web.it.nctu.edu.tw/~qtqm/qtqmpapers/2014V11N1/2014V11N1_F1.pdf
Cooper GF, Bahar I, Becich MJ, Benos PV, Berg J, Espino JU, Jacobson RC, Kienholz M, Lee AV, Lu X, Scheines R, Center for Causal Discovery team. The Center for Causal Discovery of biomedical knowledge from Big Data. Journal of the American Medical Informatics Association 2015. PMID: 26138794
![Page 63: Center for Causal Discovery (CCD) of Biomedical Knowledge ... · Availability of Big Biomedical Data •The variety, richness, and quantity of biomedical data have been increasing](https://reader034.fdocuments.in/reader034/viewer/2022050214/5f606c7f42ce67774c130be9/html5/thumbnails/63.jpg)
Acknowledgements
• Thanks to the 40+ members of the Center for Causal Discovery for their contributions to the Center activities that are described here.
• The Center for Causal Discovery is supported by grant U54HG008540 awarded by the National Human Genome Research Institute through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative (www.bd2k.nih.gov). The content of this presentation is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.