Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional...
Transcript of Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional...
![Page 1: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/1.jpg)
Compositional Miningof Biological Data
Naren Ramakrishnan T.M. MuraliDepartment of Computer Science
Virginia Tech, VA 24061
![Page 2: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/2.jpg)
Motivation
● Increasing categories of functional screens
Microarrays
Deletion Mutants
RNAi
![Page 3: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/3.jpg)
Motivation
● Increasing forms of interaction data– PPI, ChIP-on-chip, genetic, metabolic, ...
![Page 4: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/4.jpg)
Motivation
● Increasing portfolios of pathways
![Page 5: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/5.jpg)
“ Chaining” Inferences
● Module Networks– Regulators “X” regulate genes “Y” under
conditions “Z”
(Segal et al. Nature Genetics,2003)
![Page 6: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/6.jpg)
“ Chaining” Inferences
● Connectivity Map– Perturbagens “X” mimic/suppress disease “Y”
through action of genes “Z”
(Lamb et al. Science,2006)
![Page 7: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/7.jpg)
Are we there yet?
● Different scientists, different perspectives– Multitude of approaches to data reduction
● What is needed– SQL:Database querying::???:Database mining
![Page 8: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/8.jpg)
Compositional Data Mining
● A way to compose simpler algorithms ...– Redescription mining– Biclustering
● ... to support complex analytical functions● Not a data mining program
– But a data mining program generator!
![Page 9: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/9.jpg)
Two simple primitives
● Redescription mining– Mines within a “domain”
● Biclustering– Mines across two domains
![Page 10: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/10.jpg)
What are redescriptions?
A shift-of-vocabulary or a different way of communicating a given piece of
information.
![Page 11: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/11.jpg)
Redescriptions: Toy Example
![Page 12: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/12.jpg)
Redescriptions: Toy Example
![Page 13: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/13.jpg)
Redescriptions: Toy Example
![Page 14: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/14.jpg)
Redescriptions: Toy Example
![Page 15: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/15.jpg)
Redescriptions: Toy Example
![Page 16: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/16.jpg)
Redescriptions: Toy Example
![Page 17: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/17.jpg)
Redescription Mining
● Given– a collection of objects (countries, genes)– a collection of descriptors
● Find– subsets that can be defined in at least two
ways
![Page 18: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/18.jpg)
An example redescription
Countries with land area > 3,000,000 square miles -Tourist Destinations in the Americas
Permanent members of the UN Security Council AND
Countries with history of communism
![Page 19: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/19.jpg)
More on redescriptions
● Can restrict expressions– To be of a certain syntactic form
● Can allow approximate redescriptions– Jaccards coefficient = |X ∩ Y|/|X ∪ Y|
● Can require statistical significance– According to set overlap distributions
![Page 20: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/20.jpg)
Applications in Bioinformatics
● (Gene) descriptors galore!– Genes localized in the mitochondrion– Genes up-expressed >=2 fold in heat stress– Genes encoding for proteins in the
immunoglobin complex– Genes involved in glucose biosynthesis– Genes handpicked by Prof. Genie– Genes clustered by your favorite algorithm
![Page 21: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/21.jpg)
Redescriptions: Application to Environment Stress in Yeast
● Descriptors over approx. 300 yeast ORFs
![Page 22: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/22.jpg)
A redescription
![Page 23: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/23.jpg)
A redescription
![Page 24: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/24.jpg)
What redescriptions offer
● A way to bridge vocabularies– Uniformity of modeling descriptors
● Conceptual clustering– Uses one set of descriptors to define another
● Automatic determination of mutually reinforcing features– Without explicit training data
![Page 25: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/25.jpg)
Biclustering
Simultaneously identify sets of entities from two domains that exhibit concerted behavior.
![Page 26: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/26.jpg)
Biclusters: Toy Example
![Page 27: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/27.jpg)
Biclusters: Toy Example
![Page 28: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/28.jpg)
Biclusters: Toy Example
![Page 29: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/29.jpg)
Biclusters: Toy Example
![Page 30: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/30.jpg)
Biclusters: Toy Example
![Page 31: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/31.jpg)
Biclusters: Toy Example
![Page 32: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/32.jpg)
Biclusters: Toy Example
![Page 33: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/33.jpg)
Biclusters: Toy Example
![Page 34: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/34.jpg)
More on biclusters
● Can mine approximate biclusters– “Dense” instead of “all 1s”
● Can require statistical significance– According to set overlap distributions
![Page 35: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/35.jpg)
Biclustering: Transcriptional regulation in S. cerevisiae
● Two datasets: Growth of S. cerevisiae cells in rich medium and under exposure to rapamycin
● What are the differences between the activated transcriptional regulatory network under these two conditions?
![Page 36: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/36.jpg)
Computed biclusters
![Page 37: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/37.jpg)
Combinatorial control by RTG3 and GLN3
![Page 38: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/38.jpg)
Recap
● Redescriptions– Map descriptors within a domain (e.g., genes
to genes)
● Biclusters– Map descriptors across domains (e.g., TFs to
genes)
● Key idea: can arbitrarily compose these– To bridge diverse domains
![Page 39: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/39.jpg)
CDM: Desiccation tolerance in C.elegans
● Question: Find a set of genes to knock-down, via RNAi, so as to confer improved desiccation tolerance in C. elegans
● Available data:– Genes X TFs– Genes X Phenotypes
![Page 40: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/40.jpg)
CDM: Desiccation tolerance in C.elegans
Two biclusters joined at the Gene interface
![Page 41: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/41.jpg)
CDM: Aging in worms and flies
● Question: analyze similarities in gene expression programs underlying aging in C. elegans and D. melanogaster
● Available data:– Worm age X Worm genes (exp. values)– Worm genes X Fly genes (homology)– Fly age X Fly genes (exp. values)
![Page 42: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/42.jpg)
CDM: Aging in worms and flies
Three biclusters related by two redescriptions
![Page 43: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/43.jpg)
CDM Software Architecture
● Data Model Compiler● Data Mining Plan Generator● Visualization Interfaces
![Page 44: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/44.jpg)
Data Model Compiler
● From a specification of– a database schema (SQL DDLs)
● Automatically generate– a database schema for CDM– redescription/biclustering algo. Interfaces
![Page 45: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/45.jpg)
Data Mining Plan Generator
● Compile a request – for connections between biological domains
● Into– A composition of redescriptions and
biclusters
● Research issues– Set-based versus tuple-based joins– Hard versus soft joins– Use “query flocks” to organize related
queries
![Page 46: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/46.jpg)
Visualization Interfaces
● Three-tiered interface– Bicluster level view– Set view– Tuple (individual) view
![Page 47: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/47.jpg)
CDM Software Architecture
![Page 48: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/48.jpg)
Case studies
● Storytelling in PubMed abstracts● Yeast functional genomics● Small molecule-gene-disease modeling
![Page 49: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/49.jpg)
Biological storytelling
Study metabolic arrest/recovery across organisms of diverse complexity
![Page 50: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/50.jpg)
Storytelling as CDM
● Compose only redescriptions– No biclusters
● Do not use set constructions– Just given descriptors
● Goal:– Relate dis-similar entities through
compositions of similarities
![Page 51: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/51.jpg)
Storytelling is sort of like ...
● the MorphWord puzzle– PURE– PORE– POLE– POLL– POOL– WOOL
![Page 52: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/52.jpg)
Example storytelling task
● Connect– L. Garczarek, N. Ramakrishnan, D. Kumar, R.F. Helm,
and M. Potts, Global cross-over points in the genome responses of Synechocystis sp. PCC 6803, to dehydration, UV-irradiation, and other stresses, under communication to BMC Microbiology, 2007.
● To
– M.B. Roth and T. Nystul, Buying time in suspended animation, Scientific American, Vol. 292, No. 6, pages 48-55, June 2005.
![Page 53: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/53.jpg)
Spinning a story ...
● From– L. Garczarek, N. Ramakrishnan, D. Kumar, R.F. Helm,
and M. Potts, Global cross-over points in the genome responses of Synechocystis sp. PCC 6803, to dehydration, UV-irradiation, and other stresses, under communication to BMC Microbiology, 2007.
● To
– L. Schmitt and R. Tampe, Structure and mechanism of ABC transporters, Current Opinion in Structural Biology, Vol. 14, No. 4, pages 426-431, Aug 2004.
Link: CBS Domains
![Page 54: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/54.jpg)
Spinning a story ...
● From
– L. Schmitt and R. Tampe, Structure and mechanism of ABC transporters, Current Opinion in Structural Biology, Vol. 14, No. 4, pages 426-431, Aug 2004.
● To
– J.W. Scott, S.A. Hawley, K.A. Green, M. Anis, G. Stewart, G.A. Scullion, D.G. Norman, and D.G. Hardie, CBS domains form energy-sensing modules whose binding of adenosine ligands is disrupted by disease mutations, Journal of Clinical Investigation, Vol. 113, No. 2, pages 182-184, Jan 2004.
Link: Molecular complexes of CBS Domains
![Page 55: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/55.jpg)
Spinning a story ...
● From
– J.W. Scott, S.A. Hawley, K.A. Green, M. Anis, G. Stewart, G.A. Scullion, D.G. Norman, and D.G. Hardie, CBS domains form energy-sensing modules whose binding of adenosine ligands is disrupted by disease mutations, Journal of Clinical Investigation, Vol. 113, No. 2, pages 182-184, Jan 2004.
● To
– C. Tang, X. Li and J. Du, Hydrogen sulfide as a new endogenous gaseous transmitter in the cardiovascular system, Current Vascular Pharmacology, Vol. 4, No. 1, pages 17-22, Jan 2006.
Link: Ligands bound to CBS Domains
![Page 56: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/56.jpg)
Spinning a story ...
● From
– C. Tang, X. Li and J. Du, Hydrogen sulfide as a new endogenous gaseous transmitter in the cardiovascular system, Current Vascular Pharmacology, Vol. 4, No. 1, pages 17-22, Jan 2006.
● To
– M.B. Roth and T. Nystul, Buying time in suspended animation, Scientific American, Vol. 292, No. 6, pages 48-55, June 2005.
Link: Hydrogen sulphide
![Page 57: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/57.jpg)
Storytelling on System X
● Distributed indexing and similarity search● Bidirectional pursuing of “leads”● Simulations for significance testing
![Page 58: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/58.jpg)
Stories about storytelling
![Page 59: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/59.jpg)
Biological storytelling
● Given– 18 extra-cellular molecules
● CD38, CXCL1, IFN-gamma, IGF-1, IL-13, IL-1beta, IL-24, IL-6, IL-8, MMP etc.
– 1 intra-cellular molecule● (poly)ADP-ribose
● Find– Chains of redescriptions between abstracts
discussing these molecules
![Page 60: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/60.jpg)
Biological storytelling
● Document seed set– Retrieve 203,872 documents
● Remove review papers
– Label 4757 documents with molecules (4737+20)
● Document modeling for sim search– 96,218 terms after stemming & stopword
removal– Weighted TFIDF (for doc length)
![Page 61: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/61.jpg)
Biological storytelling
● Storytelling algorithm tradeoffs– Higher similarity versus shorter stories
![Page 62: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/62.jpg)
Biological storytelling
● Basic statistics– Most popular hub
● PubMed ID 8064725: `Altered poly(ADP-ribose) metabolism in family members of patients with systemic lupus erythematosus'
– Second most popular hub● PubMed ID 2684169: `Two
types of antibodies inhibiting interleukin-2 production by normal lymphocytes in patients with systemic lupus erythematosus'
![Page 63: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/63.jpg)
Biological storytelling
● Frequent episode mining– Mining novellas– e.g., PubMed ID 16430457 -> ... -> 1386861
● Story compression– Reduce novellas to single symbol– Identify and remove frequently reused
subpaths
● Story summarization– Tile sentences using sentence cohesion
check
![Page 64: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/64.jpg)
The StoryGrapher
Available for demo/download athttps://bioinformatics.cs.vt.edu/storytelling/
![Page 65: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/65.jpg)
Sentence-tiled story
![Page 66: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/66.jpg)
Yet to do...
● Model– Cell types and cell lines
● Account for– “artificial enrichment” for certain
methodologies
● Address– Author bias– Messiness of information integration
![Page 67: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/67.jpg)
Status of CDM
● Implemented using open source software– Parallel implementations of key algorithms
and significance calculations
● Many instantiations underway– VIGEN (Virginia Center for Genomics)– VBI (Virginia Bioinformatics Institute)
● We welcome collaborations!
![Page 68: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/68.jpg)
Acknowledgements
● BIO faculty– Rich Helm– Malcolm Potts
● CS students– Joe Gresock– Deept Kumar– Greg Grothaus– Srinivas Santhanam– Mahima Gopalakrishnan– Anthony McNevin
![Page 69: Compositional Mining of Biological Datapeople.cs.vt.edu/naren/slides/cdm-talk.pdf · Compositional Data Mining A way to compose simpler algorithms ... – Redescription mining –](https://reader033.fdocuments.in/reader033/viewer/2022041614/5e39fc738eb51823ec77ad72/html5/thumbnails/69.jpg)
Thank you!
● Contact info:– Naren Ramakrishnan, [email protected],
http://people.cs.vt.edu/~naren– T.M. Murali, [email protected],
http://people.cs.vt.edu/~murali