OMPOL – visualisation of large chemical spaces

Post on 27-Jan-2017

232 views 2 download

Transcript of OMPOL – visualisation of large chemical spaces

OMPOL – Visualisation of large chemical spaces

Peter Corbett, Colin Batchelor, Alexey Pshenichnov, Valery Tkachenko

Royal Society of Chemistry

ACS Spring 2016San Diego, CAMarch 17th 2016

CompoundsReactionAnalytical DataText and References

ChemSpider Synthetic Pages

Chemical space - 1060

RSC Data Repository

Data Repository

Properties Names and Identifiers Spectra Articles Data

Collections Patents Etc

RSC CompoundsRSC ReactionsRSC SpectraRSC CrystalsRSC PolymersRSC MaterialsRSC AssaysRSC AlgorithmsRSC Models…and on…

RSC Databases

Record labels

Need to be able to see what sorts of structures are in a collection, how they relate to each other, etc.Could use something like clusteringDimensionality Reduction – chemical structures -> fingerprints -> large dimensional space -> small dimensional spaceStandard technique – Principal Components Analysis (PCA)

Visualising Chemical Space

Dimensionality Reduction – First make a molecule-feature matrix

1 0 0 0 0 0 0 0 … 0

0 0 1 0 0 0 0 0 … 0

1 1 0 0 1 0 0 0 … 1

1 1 0 1 0 0 0 0 … 1

1 1 0 0 0 0 0 0 … 0

1 0 0 0 0 0 0 1 … 0

1 0 1 0 1 1 0 0 … 0

1 0 0 1 0 0 0 0 … 1

PCA/SVD

The result0.209 0.078 -0.368 …

0.030 0.297 0.174 …

0.509 0.005 0.343 …

0.514 -0.394 0.172 …

0.320 -0.034 -0.198 …

0.228 0.108 -0.791 …

0.338 0.812 0.151 …

0.403 -0.281 0.003 …

<--- Most important Least important --->

Plot on a graph

Need an interactive scatterplotWeb delivery => JavaScript

Need, at minimum, to click, mouseover, pan and zoomExisting scatterplot libraries, e.g. flot.js, are plentiful and

well supported……but do not scale well – become slow and unresponsive

with ~40,000 data points

The problem

Make your own graph-plotting toolOMPOL – One Million Points Of Light – an aspiration for scalability

HTML5 Canvas“Google maps” style drawing

Divide graph into panelsDraw panels as they come onto the screenAssemble display from pre-drawn panels

Opportunity for better ways of exploring the data

The solution

ChEBI~50000 compounds, of “Biological Interest”Has an ontology of compound types

Example data

Display data from dimensional reductionSelecting data points, sets of data points“Narrowing down” a cluster of compounds based on distribution in multiple dimensionsExporting dataUsing name and ontology information to select groups of points

What we’re going to show

Works very nicely with ~50000 data points and all featuresDuring development, was able to work with 1M and on occasion 10M data points

Only in 2D, didn’t have all features turned enabled

How scalable?

Interacting with large (tens of thousands to millions of data points) multidimensional data sets is now a definite possibility

Conclusion

Thank you

Email: tkachenkov@rsc.org

Slides: http://www.slideshare.net/valerytkachenko16