Networks are useful for describing systems of interacting objects, where the nodes represent the...

17
Network link prediction by global silencing of indirect correlations. Baruch Barzel & Albert-László Barabási.

Transcript of Networks are useful for describing systems of interacting objects, where the nodes represent the...

Page 1: Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

Network link prediction by global silencing of indirect correlations.

Baruch Barzel & Albert-László Barabási.

Page 2: Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

Index

- Introduction.- Problem Definition.- Current Solution.- Proposed Solution.- Application to a model system and its results.- Application to a real world system and its results.- Inadequacies.- Conclusion.

Page 3: Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

Introduction• Networks are useful for describing systems of interacting objects, where the nodes

represent the objects and the edges represent the interactions between them.

• Similarly in a gene network each node corresponds to a gene, and a pair of nodes is connected with an edge if there is a significant co-expression relationship between them.

But unlike many networks - like the com po nents inside your car engine or the wires inside a robot - biological sys tems are black boxes. We can observe the out come of their inter ac tions, but not the inter ac tions themselves.

Barzel & Barabási devel oped a math e mati cal method for peering inside that box. Hence we move a step closer in a quest to under stand, pre dict, and con trol human disease.

Page 4: Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

What is “ Network Link Prediction “

In the context of biology, link prediction refers to the problem of identifying functional links between genes from data that may be confounded by indirect effects.

• Suppose we are looking at 3 genes A, B, and C.• Gene A inhibits the expression of gene B, and also gene B inhibits the expression of gene C.• If the expression of A increases, it will decrease the expression of B, which in turn increase C.• Therefore one might observe correlation in the expression levels of gene A and C, even though there is no direct interaction between them.

A

B

C

Lets see how do we construct gene networks considering this problem

Gene Expression helps in identifying active and inactive genes in a cell

Page 5: Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

Construction of a Network from a Gene Correlation Matrix

We start with a gene expression values of m genes for n samples (conditions), the input data would be an m×n matrix.

In first step, a similarity score (co-expression measure) is calculated between each pair of rows in expression matrix. The resulting matrix would be an m×m matrix. Each element in this matrix shows how similar the expression level of two genes change together.

PearsonCorrelation

Or Mutual Information Or Euclidian DistanceOr Spearman Rank Correlation

Page 6: Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

Pearson correlations measure of the correlation between two variables X and Y, giving a value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation.

Mutual information measures how much knowing the expression levels of one gene reduces the uncertainty about the expression levels of another.

Euclidean distance measures the geometric distance between two vectors, and so considers both the direction and the magnitude of the vectors of gene expression values.

Common correlation measures

Page 7: Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

Construction of a Network from a Gene Correlation Matrix

elements in similarity matrix which are above a certain threshold are replaced by 1 and the remaining elements are replaced by 0.

> = 0.8

Such simple thresholding is known to predict spurious links and overlook the true links.

Barzel & Barabási propose a new method to silence such indirect responses.

Page 8: Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

The Silencing Method

Global Response Matrix G, can be measured directly from gene expression measurement.

It captures the change in node i’s activity in response to changes in node j’s.

Gij cannot distinguish between direct and indirect relationships: a path i → k → j can result in a measurable response observed between I and j, falsely suggesting the existence of a direct link between them.

Barzel & Barabási introduce Local Response Matrix S, in which contribution of indirect effects is eliminated.

“∂” indicates that Sij is defined to capture only local effects, i.e. the response of i to changes in j when all surrounding nodes except i and j remain unchanged.

Barzel & Barabási derive a method for calculating the local response matrix from experimentally accessible global response matrix. Going ahead it is shown that the resulting Sij matrix, in which the contribution of indirect paths is silenced, is more discriminative than the empirically obtained Gij matrix, enhancing our ability to extract direct links from experimentally collected correlation data.

Page 9: Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

Brief calculation for Silencing Method derivationWe’ve already seen Global and Local response matrix , G and S resp.

To extract Sij from the experimentally accessible Gij, we formally link both the equation

We can further solve this to calculate EQ.X

Which provides Sij from the experimentally accessible Gij by ‘silencing’ indirect responses and preserving direct response terms.

Page 10: Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

Thresholding

The experimentally observed global response matrix, Gij, accounts for direct as well as indirect correlations, with no clear separation between them.

Thresholding predicts spurious links (thick dashed lines) and overlook true links (thin solid lines).

Thus although the average Gij terms associated with direct links are higher than the average terms associated with indirect links, as captured by the discrimination ratio, ∆G, the difference is not sufficient to fully discriminate between direct and indirect links.

∆G = {Gij}Dir / {Gij}In-dir

Page 11: Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

Instead of thresholding if we apply Silencing method the flow from the source j to the target i is carried through the indirect effect Gkj

(brown) coupled with the direct impact Sik of the target’s nearest neighbor k.By silencing the indirect contributions, EQ X provides the local response matrix, Sij, whose nonzero elements correspond to direct links.

As indirect terms become much smaller in Sij, we obtain a greater discrimination ratio, ∆S.

∆S = {Sij}Dir / {Sij}In-dir

Page 12: Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

Results of Silencing in model systemsAuthors used a scale-free network consisting of N=5000 nodes and L=20000 links to test power of EQ.X. We obtain Gij by perturbing the activity of each node and then calculated Sij using EQ.X.

Gij and Sij associated with interacting and non interacting node pairs. Sij silences the correlations associated with indirect interactions, resulting in a clear separation between direct and indirect interactions, a phenomenon absent from Gij.

Indeed, the receiver operating characteristic (ROC) curve derived from Gij has an area of AUROC = 0.91, reflecting inherent limitations in separating direct from indirect interactions based on Gij only.

In contrast, for Sij we obtain AUROC = 0.997 (blue), where the true-positive rate reaches 100% with a false-positive rate

Page 13: Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

The discrimination ratio, ∆s, is much higher in Sij compared to ∆G, of Gij. This indicates that Sij is a much better predictor of direct versus indirect interactions.

This silencing effect can be quantified in terms of discrimination ratio k.

In model system it was found that k= 15 i.e. S has 15 times more power to discriminate direct from indirect interaction as compared to G

Longer the distance dij between two nodes, the larger is the silencing. Consider a linear cascade in which changes in any node result in a finite response Gij by all other nodes.

EQ. X silences all indirect responses, while leaving the response of direct links effectively unchanged, offering a discriminative measure that enables a perfect reconstruction of the original network.

Page 14: Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

Results Silencing in real world systems

To test the predictive power of equation X on real data, E. coli data set was used.The input data includes the expression levels of 4,511 genes were measured under different experimental conditions, giving rise to an 805 X 4511 expression matrix

Three separate global response matrices Gij were generated based on - Pearson correlations - Spearman rank correlations -mutual information

From each of the three Gij matrices, Sij was obtained using EQ.X, and its performance was compared with Gij and validated against gold standard used in the DREAM5 challenge.

56% improvement

67% improvement

6% improvement

Page 15: Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

Inefficiency in handling Hidden Nodes

A network with 8 nodes of which 2 are hidden. Theresulting sub-network has 6 nodes (light blue), 5 of which constitute a connected component and one which is isolated.

Silencer equation if applied to the sub-network will successfully silence the indirect correlations associated with the unhidden paths of the connected component. However the correlationsbetween the isolated node and the rest of the network, which cannot be associated with an existing indirect path, will not be silenced.

Thus as long as the isolated node pairs (connected via hidden paths) are a minority Sij maintains its advantage, but if the majority of nodes become isolated, Sij becomes comparable to Gij and hence no silencing effect.

Consider a simple example of a linear cascade i → k → j, in which k is a hidden node, and all we are offered is experimental response of i to j , Gij. Clearly, under these circumstances, EQ.X will not be able to classify the i → j link as indirect. Indeed, because it is mathematically impossible to classify this link as direct or indirect, as there is no information in the observed response matrix from which the existence of node could be inferred.

Page 16: Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

This research could be pivotal in tack ling a range of problems that involve under standing the com plex network sys tems. This work spans from studying the global spread of dis ease to ana lyzing social media data as a way to better under stand fields ranging from politi cal sci ence to dis aster preparedness.

Conclusion

Hence Silencer Equation helps translate the ever-growing amount of data on global correlations which contains both direct as well as indirect interactions into valuable local information with only direct interactions.

Page 17: Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

Thank You.