Imputation of protein activity data using deep learning

20
Imputation of protein activity data using deep learning Gareth Conduit and Tom Whitehead

Transcript of Imputation of protein activity data using deep learning

Page 1: Imputation of protein activity data using deep learning

Imputation of protein activity datausing deep learning

Gareth Conduit and Tom Whitehead

Page 2: Imputation of protein activity data using deep learning

Neural network algorithm to

Merge simulations, physical laws, and experimental data

Reduce the need for expensive experimental development

Accelerate drugs and materials discovery

Generic with experimentally proven applications in materials discovery

Page 3: Imputation of protein activity data using deep learning

A black box

Drugdesign

Drugdesign

Page 4: Imputation of protein activity data using deep learning

Train on complete data

Protinactivit

Proteinactivity

Page 5: Imputation of protein activity data using deep learning

Digitize handwriting

Drugdesign

Drugdesign

Page 6: Imputation of protein activity data using deep learning

Train on fragmented data

Protinactivit

Proteinactivity

Page 7: Imputation of protein activity data using deep learning

Predict on fragmented data

Drugdesign

Drugdesign

Page 8: Imputation of protein activity data using deep learning

Enhancing drug discovery

ChEMBL dataset just 0.1% complete

Page 9: Imputation of protein activity data using deep learning

Data available for Adrenergic receptors

5 proteins with 1731 compounds, dataset 37% complete

5 proteins

138

5 co

mpo

und

s

Page 10: Imputation of protein activity data using deep learning

Impute the Adrenergic receptors dataset

Original dataset 37% complete, filled 65% of entries

Page 11: Imputation of protein activity data using deep learning

Additional descriptors for Adrenergic receptors

x3 x1 x10 x1

Page 12: Imputation of protein activity data using deep learning

Improved predictions

Include structural information to fill to 82%

Page 13: Imputation of protein activity data using deep learning

Predictions from just activities

Page 14: Imputation of protein activity data using deep learning

Predictions from activities and descriptors

Page 15: Imputation of protein activity data using deep learning

Predictions with just descriptors

Page 16: Imputation of protein activity data using deep learning

Comparison to random forest

Page 17: Imputation of protein activity data using deep learning

Future prospects

Enhance ChEMBL dataset from 0.1% to 20% complete

Page 18: Imputation of protein activity data using deep learning

Materials designed

Drug discovery

3D printed alloydesigned from10 data entries

Found errors inmaterials databases

Page 19: Imputation of protein activity data using deep learning

Even more materials designed

Battery designwith DFT andexperimental data

Designing lubricantswith DFT andexperimental data

Nickel andmolybdenumalloys

Page 20: Imputation of protein activity data using deep learning

Summary

Apply deep learning to high-value fragmented data

Merge experiments and simulations into holistic design tool

Experimentally proven applications in materials design,

founded start-up intellegens

Scientists establish all possible sources of information