Coupling AI and network biology · 101 (3), 187 –209. • Nodes 1 and 8 are are close in...

Post on 26-Sep-2020

0 views 0 download

Transcript of Coupling AI and network biology · 101 (3), 187 –209. • Nodes 1 and 8 are are close in...

Coupling AI and network biologyGenerate insights for disease understanding and target identification

Alexandr Ivliev, Director of BioinformaticsCheng Fang, Scientific Consultant03.06.2020

Agenda 1. Introduction2. Coupling AI and network biology3. High-quality biological networks

in microbiome for target ID4. Key takeaways

Introduction

3© 2020 Clarivate

Genomic revolution of early 2000’sFrom individual genes to understanding entire genome

© 2020 Clarivate 4

Computer vision

Natural language processing

Reinforcement learning

Artificial intelligence: new ongoing revolution

• Self-driving cars• Face recognition• And more

• Machine translation• Speech analysis• And more

• Chess, go, computer games• Robotics• And more

Revolutionizing industries

© 2020 Clarivate 5

6© 2020 Clarivate

First time “deep learning” appeared in Gartner Hype Cycle for Emerging Technologies in 2017Artificial intelligence is a hot field

7© 2020 Clarivate

Deep learning is a big field

https://www.asimovinstitute.org/neural-network-zoo/

8© 2020 Clarivate

Computer vision and image processing

Esteva et al, Nat Med, 2019

9© 2020 Clarivate

Text mining, e.g. electronic health records

Esteva et al, Nat Med, 2019

Applications in genomics

© 2020 Clarivate 10Esteva et al, Nat Med, 2019

Networks are how biology works

Network by Martin Grandjean © 2020 Clarivate 11

• Disease mechanism understanding

• Target ID

Drug target

Disease genes

12© 2020 Clarivate

Can biological networks be coupled with deep neural networks to enable disease mechanism understanding and target ID?

Target

13© 2020 Clarivate

What approaches are you taking to understand disease mechanisms and identify novel drug targets?

a. Literature searchesb. OMICs data analysisc. Small scale lab experimentsd. Classical machine learninge. Deep learningf. Other or inapplicable

Coupling AI with network biology to enable disease understanding and target ID

14© 2020 Clarivate

15© 2020 Clarivate

Problem: graphs are structurally very different from inputs in other AI solutions Networks are quite different from texts and images

Text and sequences have linear 1D structure

“To be or not to be, that is the question”

CGT TTA GAA

Images have 2D grid structure

Networks are more complex

16© 2020 Clarivate

Images and texts work fine as input for neural networks

0.1230.2240.8510.4010.4860.2980.6940.8870.6530.1010.696

Prediction

17© 2020 Clarivate

Images and texts work fine as input for neural networks

Solution 1: Generating biological network node embeddings using random walks

18© 2020 Clarivate

19© 2020 Clarivate

What is “embedding”?

bagel

Sequences

• Embedding = dense vector that captures important information from the input

• Embeddings can be learned automatically as opposed to feature engineering

• “Similar” objects have close embeddings

• It’s easy to use embeddings as input into AI techniques

kingqueen

Numeric space

word2vec

‹#›© 2020 Clarivate

Generating node embeddings

Node 7

Biological network

Node 2Node 1

Numeric space

Node 6

Node 3,816

Node 4Node 5

Node 3

‹#›© 2020 Clarivate

Generating node embeddings

Node 7

Biological network

Node 2Node 1

Numeric space

Node 6Node 3,816

Node 4Node 5

Node 3

Sequences

word2vec

node2vec

General ideaGenerating node embeddings using random walks

1

2

3

4

67 8

Examples of random walks starting from node 3 with four steps:

3 -> 4 -> 6 -> 7 -> 83 -> 2 -> 3 -> 1 -> 33 -> 4 -> 6 -> 3 -> 4

© 2020 Clarivate 21

Random walk sequences3 4 6 7 83 2 3 1 33 4 6 3 4

TextEmbedding

Model

Node graph embeddingsNode 1: -0.01822536, 0.14636423, 0.023379749 …Node 2: 0.10925472, 0.00750885, -0.019593006 ……

24© 2020 Clarivate

Random walk variant example: Node2Vec

node2vec: Scalable Feature Learning for Networks. Grover, A., & Leskovec, J. (2016).

Breadth first Depth first

It’s a scale of behaviorcontrolled by two hyperparameters

Nodes in “local communities”are more similar to each other

Nodes having alike “structural roles”are more similar to each other

22© 2020 Clarivate

Coupling biological networks with deep neural networks to enable disease understanding and target ID

Node embeddings from random

walks

Novel predicted

targets

Training set of known targets

DEGs

25© 2020 Clarivate

Challenges with simple random walks

• Nodes 1 and 8 are “similar” as they have the same attributesBut they will be far from each other in in random walks

• Node 9 is unreachable from any other node, yet it’s “similar” to node 6

How do we capture those non-trivial similarities?

Attributese.g.: • up-regulated = yes• protein class = kinase• known target = yes

Nodese.g.: • protein 1• protein 2• protein 3

26© 2020 Clarivate

How to incorporate attributes into graph embeddingsGat2Vec

Structural graphwithout attributes

Bipartite attributes graph

Generate random walks on each graph independently,and supply both sets of sequences into word embeddings learning

gat2vec: representation learning for attributed graphs. Sheikh, N., Kefato, Z., & Montresor, A. (2019). Computing, 101(3), 187–209.

• Nodes 1 and 8 are are close in bipartite graph

• Node 9 is connected to other nodes on bipartite graph

Caveats:- Attributes can be only discrete- Cannot use complex attributes, like SMILES, amino-acid sequences, etc

13

2

1 -> 3 -> 2 => “1 3 2”

1 -> a -> 8 => “1 8 2”

1

8 -> b -> 2

a

b

2

Random walk => node ids “words”From structural graph:

From attributes graph:

27© 2020 Clarivate

Example applicationTarget prioritization using gat2vec - GuiltyTargets

Protein-protein interaction

network

Discrete differential gene

expression

Known targets for the disease

Annotated protein-protein

interaction network

Features obtained using

Gat2Vec

Positive-unlabeled learning

Rank candidate targets

- STRING- HIPPIE

RNASeq from different cohorts(MSBB, MayoRNASeq, ROSMAP, etc.)

GuiltyTargets: Prioritization of Novel Therapeutic Targets with Deep Network Representation Learning. Muslu, Ö., Hoyt, C. T., Hofmann-Apitius, M., & Fröhlich, H. (2019). BioRxiv, 521161.

- Open Targets- Therapeutic Targets Database

Best ROC AUCfor different diseases≈0.92-0.94

28© 2020 Clarivate

Coupling biological networks with deep neural networks to enable disease understanding and target ID

Node embeddings from random

walks in (1) structural graph; and (2)

attribute graph

Novel predicted

targets

Training set of known targets

Kinases

**

*

GWAS hits

DEGs

+

+

Solution 2: Building artificial neural nets to structurally reflect abiological network of interest

29© 2020 Clarivate

30© 2020 Clarivate

Graph neural networks

The little cat looks lovely.

Input Graph neural net

Molecule

Physical path

Text

31© 2020 Clarivate

Graph neural networks for target ID – one approachA node’s neighborhood defines a computational graph

Features on node A

Features on node C

Any differentiable function that aggregates multiple vectors into one

The beauty is: it’s all a differentiable computational graph that can be optimized using backpropagation.

“Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018”

e.g.:• protein class• druggability• genetic link• differential

expression

e.g.:• A + C• A – 0.34 * C• A * (A + 1.4 * C)

BIOLOGICAL NETWORK

‹#›© 2020 Clarivate

Graph neural network for target ID

• Every node has its own unique computational graph defined by the biological network structure

• These computational graphs are neural networks that can be trained using standard AI techniques

“Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018”

BIOLOGICAL NETWORK

COMPUTATIONAL GRAPH = ARTIFICIAL NEURAL NETWORKFOR EACH NODE

A B C D E F

33© 2020 Clarivate

Example: Decagon algorithmModeling polypharmacy side effects with graph convolutional networks

Modeling polypharmacy side effects with graph convolutional networks. Zitnik, M., Agrawal, M., & Leskovec, J. (2018).

34© 2020 Clarivate

These methods open the doors to coupling AI with biological networks

Targets

Indications

Mechanisms

35© 2020 Clarivate

Key challenges

Garbage in –garbage out• The need for high-

quality networks• And large high-quality

training sets

Knowledge bias

• It’s hard to predict completely unknown from the known

Model interpretation

• Opening the “black box”

36© 2020 Clarivate

How optimistic are you that AI will transform pharma R&D in 5 years?

a. It'll revolutionize researchb. It'll yield incremental advancesc. It won't solve any of the major challengesd. Other

Curating high-quality networks in microbiome for target ID

37© 2020 Clarivate

Key challenges

Garbage in –garbage out• The need for high-

quality networks• And large high-quality

training sets

Knowledge bias

• It’s hard to predict completely unknown from the known

Model interpretation

• Opening the “black box”

High-quality biological networks in microbiome for target ID

38© 2020 Clarivate

39© 2020 Clarivate

“The human microbiome and why the solution for all disease lies within our own gut” Nov 2017

40© 2020 Clarivate

Why should we care about the microbiome?

Its importance has long been recognized (first described 1700 years ago!) and used in medical practice– FMT (fecal microbiota transplant). Fecal transplantation is performed as a treatment for recurrent C. difficile colitis infection (CDI). C. difficile colitis, a complication of antibiotic therapy, may be associated with diarrhea, abdominal cramping and sometimes fever.

Adverse effects are poorly understood.

41© 2020 Clarivate

Why should we care about microbiome: a new era

Microbiome is implicated in health and diseases:• IBD and Crohn’s diseases • Obesity & Diabetes• Immune functions & malfunction• Autoimmunity diseases • Cardiovascular diseases • Neurological diseases• Oncology

Cani, P. Nat Rev Gastroenterol Hepatol 14, 321–322 (2017).

Source: Cortellis Drug Discovery Intelligence

42© 2020 Clarivate

Active drug development• 781 drug and biologics

under development• 73 in clinical trials

Source: Cortellis Drug Discovery Intelligence

43© 2020 Clarivate

Active drug development • Vast majority of the

microbiome drugs in clinical trials have no specific mechanisms

• Sodium oligo-mannurarate extracted from algae developed at Shanghai Green Valley to treat mild to moderate AD.

• Sibofimloc is inhibitor of type 1 fimbrial adhesin from E. Coli. It’s in phase one for IBD.

44© 2020 Clarivate

Mechanism of action matters in drug development

PHASE I PHASE II PHASE III

20 – 100 volunteers

100 –500patients

1,000 – 5,000 patients

Safety Safety/Dosing

Efficacy, Adverse Events

Drug Discovery PreclinicalTranslationalPrecision Medicine

Clinical TrialAPI Synthesis

Regulatory ReviewScale-up to MFG

IND Submitted

APPROVAL

MarketingManufacturingPost-market Surveillance (Ph. IV)

NDA Submitted

5,000-10,000compounds

~250 compounds <5 compounds

Among 640 novel therapeutics of Phase 3 clinical trials (1998-2008), 344 (54%) failed in clinical development, 230 (36%) were approved by the US Food and Drug Administration (FDA), and 66 (10%) were approved in other countries but not by the FDA. Most products failed due to inadequate efficacy (n = 195; 57%), while 59 (17%) failed because of safety concerns and 74 (22%) failed due to commercial reasons.

Hwang et al. Dec. 2016 JAMA Internal Medicine

‹#›© 2020 Clarivate

How do we leverage AI to understand MoA and identify new targets in microbiome?

Novel predicted

targets

45© 2020 Clarivate

Understanding biology is critical for target ID

• A microbe-host interaction network could be used to:– Networks can uniquely identify

potential microbial effectors that target distinct host nodes or interfere with endogenous host interactions

– Determine how mutations on either host or microbial proteins affect the interaction

– Delineate pathogenic mechanisms and thereby help maximize beneficial therapeutics

Microbe-Host protein-protein interactions 1

MAMP (microbe associated molecular pattern) – Host protein-protein interactions

Microbial metabolite – host protein interactions 2

Microbe-microbe interactions (protein-protein or protein-metabolite)

46© 2020 Clarivate

Types of microbe-host interactions

1 approx. 16,000 publications2 approx. 10,000 publications

47© 2020 Clarivate

Microbiome publications are growing fast

Source: Clarivate Analytics Web of Science, using title search terms (human microbiota, human microbiome, microbiome, human microbial, human microbes, or gut ecology).

Microbiome publications over time

48© 2020 Clarivate

A database to capture microbial-host interactions is needed for better understanding the biology

‹#›© 2020 Clarivate

Ideal literature curation workflow

Define project

Construct search strings

Review and prioritize abstracts

Acquire data and articles

Annotate and curate

articles

QC and format for

delivery

Define curation template,inclusion/exclusion criteria and prioritization strategy

Find relevant articles for review

Manual review and prioritize based on inclusion/exclusion criteria

Experience inBiomedical literature monitoring

Controlled vocabularies and public database IDs

Knowledge in developmentof biological databases

Manual curation ensures the high quality

51© 2020 Clarivate

How is a database like this constructed?

A solution for interactome reconstruction, data management and integration

Literature curation and database construction

Curator• Annotates• Enriches data• Quality control

Articles and data• Metabolite-host interactions• Microbe-microbe interactions• And more

Administrator• Design • Development• Maintenance

User query

Summary statistics

Interaction networks• Table of interactions• Access to related articles• And more

Public data sources

Proprietary data sources

Database of Microbiome-Host

Interactions (DoMI)

User interface

Example interactome reconstruction

52© 2020 Clarivate

MICROBIOME

MetaBMetaG RNA-Seq

LPS

TLR4

TRAM IRF3 IFNB

ACT

IKKE

FHA

CD11B

CD18

TBK1 CASP7

CASP3

Butyrate GPR109A IL10

IL6

NOS2IL12

HOST

BGCTaxonomy KO

MYD88 TRAF6

NFKB

TlpA

COG© 2020 Clarivate 52

Activation

Inhibition

Metabolite

protein

‹#›© 2020 Clarivate

The microbial-host interaction database will help leveraging AI

Node 7

Node 2Node 1

Numeric space

Node 6Node 3,816

Node 4Node 5

Node 3

Sequences

word2vec

node2vec

Host-microbiome network

‹#›© 2020 Clarivate

Novel predicted

targets

The microbial-host interaction database will help leveraging AI

AI is promising significant advances in the data-rich biomedical field

Biological networks are different from common AI inputs but approaches have emerged to feed biological networks into AI techniques

Manual curation remains important for creating high-quality biological networks and training sets for AI

Time will show how much of transformation versus incremental progress AI will bring into pharma R&D

54© 2020 Clarivate

Key takeaways

Q&A

‹#›© 2020 Clarivate

© 2020 Clarivate. All rights reserved. Republication or redistribution of Clarivate content, including by framing or similar means, is prohibited without the prior written consent of Clarivate. Clarivate and its logo, as well as all other trademarks used herein are trademarks of their respective owners and used under license.

Interested in learning more about Clarivate’sdrug discovery consulting services? Visit our website to learn more.

Alexandr Ivlievalexander.ivliev@clarivate.comCheng Fangcheng.fang@clarivate.com