Learning Links in MeSH Co-occurrence Network - Preliminary ... · 7/15 MeSH Terms in an Article...
Transcript of Learning Links in MeSH Co-occurrence Network - Preliminary ... · 7/15 MeSH Terms in an Article...
![Page 1: Learning Links in MeSH Co-occurrence Network - Preliminary ... · 7/15 MeSH Terms in an Article PMID- 20091016 TI - Chi-square-based scoring function for... AB - OBJECTIVES: Text](https://reader036.fdocuments.in/reader036/viewer/2022070902/5f5af0143e789f430f4fdd5f/html5/thumbnails/1.jpg)
Learning Links in MeSH Co-occurrence Network
Preliminary Results
Andrej Kastrin1 and Dimitar Hristovski2∗
1Faculty of Information Studies, Novo mesto, Slovenia
2Institute of Biostatistics and Medical Informatics, Faculty of Medicine, Universityof Ljubljana, Ljubljana, Slovenia
∗Presenting author
The First International Workshop on Large-Scale Graph Storageand Management, GraphSM 2014
April 20-24, 2014Chamonix, France
1/15
![Page 2: Learning Links in MeSH Co-occurrence Network - Preliminary ... · 7/15 MeSH Terms in an Article PMID- 20091016 TI - Chi-square-based scoring function for... AB - OBJECTIVES: Text](https://reader036.fdocuments.in/reader036/viewer/2022070902/5f5af0143e789f430f4fdd5f/html5/thumbnails/2.jpg)
2/15
Literature-Based Discovery• Find implicit relations between entities.
• Propose implicit relations as potential scientific hypoteses.
• Swanson’s XYZ model:
• Relations XY and YZ are known
• Implicit relation XZ is (putative) new discovery
X Z
Y
![Page 3: Learning Links in MeSH Co-occurrence Network - Preliminary ... · 7/15 MeSH Terms in an Article PMID- 20091016 TI - Chi-square-based scoring function for... AB - OBJECTIVES: Text](https://reader036.fdocuments.in/reader036/viewer/2022070902/5f5af0143e789f430f4fdd5f/html5/thumbnails/3.jpg)
3/15
Swanson’s Example• Blood viscosity was found to co-occur with Raynaud’s disease.
• Fish oil reduces blood viscosity.
• Fish oil was proposed as a new treatment for Raynaud’sdisease.
X
Fish oil
Z
Raynaud’s disease
Y
High blood viscosity
![Page 4: Learning Links in MeSH Co-occurrence Network - Preliminary ... · 7/15 MeSH Terms in an Article PMID- 20091016 TI - Chi-square-based scoring function for... AB - OBJECTIVES: Text](https://reader036.fdocuments.in/reader036/viewer/2022070902/5f5af0143e789f430f4fdd5f/html5/thumbnails/4.jpg)
4/15
Literature-Based Discovery as Link Prediction Problem• We can model biomedical literature as a network of biomedicalconcepts.
• Link prediction refers to the prediction of future links betweenconcepts that are not directly connected in the currentsnapshot of a network.
X Z
Y
![Page 6: Learning Links in MeSH Co-occurrence Network - Preliminary ... · 7/15 MeSH Terms in an Article PMID- 20091016 TI - Chi-square-based scoring function for... AB - OBJECTIVES: Text](https://reader036.fdocuments.in/reader036/viewer/2022070902/5f5af0143e789f430f4fdd5f/html5/thumbnails/6.jpg)
6/15
Medical Subject Headings• Comprehensive controlled vocabulary for indexing in the lifesciences.
• The 2013 version of MeSH contains 26 853 descriptors.
• Every article in MEDLINE/PubMed is indexed with about10-15 descriptors.
• Some descriptors are designated (*), indicating the article’smajor topic.
![Page 7: Learning Links in MeSH Co-occurrence Network - Preliminary ... · 7/15 MeSH Terms in an Article PMID- 20091016 TI - Chi-square-based scoring function for... AB - OBJECTIVES: Text](https://reader036.fdocuments.in/reader036/viewer/2022070902/5f5af0143e789f430f4fdd5f/html5/thumbnails/7.jpg)
7/15
MeSH Terms in an ArticlePMID- 20091016TI - Chi-square-based scoring function for...AB - OBJECTIVES: Text categorization has been used...MH - Access to InformationMH - AlgorithmsMH - Artificial IntelligenceMH - Bayes TheoremMH - *Chi-Square DistributionMH - Data CollectionMH - Data Interpretation, StatisticalMH - *Data MiningMH - HumansMH - *MEDLINEMH - Medical InformaticsMH - *Natural Language Processing
![Page 8: Learning Links in MeSH Co-occurrence Network - Preliminary ... · 7/15 MeSH Terms in an Article PMID- 20091016 TI - Chi-square-based scoring function for... AB - OBJECTIVES: Text](https://reader036.fdocuments.in/reader036/viewer/2022070902/5f5af0143e789f430f4fdd5f/html5/thumbnails/8.jpg)
8/15
MethodsLink Prediction Framework
• We have train network G [t1, t2] which contains interactionsamong nodes that take place in the time interval [t1, t2].
• We have test network G [t3, t4] which contains interactionsamong nodes that take place in the time interval [t3, t4].
• Learning task: provide a list of edges that are present in testnetwork, but absent in train network.
Train network
A
B
C
D
E
F
G
H
Test network
A
B
C
D
E
F
G
H
![Page 9: Learning Links in MeSH Co-occurrence Network - Preliminary ... · 7/15 MeSH Terms in an Article PMID- 20091016 TI - Chi-square-based scoring function for... AB - OBJECTIVES: Text](https://reader036.fdocuments.in/reader036/viewer/2022070902/5f5af0143e789f430f4fdd5f/html5/thumbnails/9.jpg)
9/15
Link Prediction Setup• Prediction and evaluation was performed on a core subnetwork.
• Core subnetwork consists of nodes with at least 3 neighbors.
Train network
A
B
C
D
E
F
G
H
Test network
A
B
C
D
E
F
G
H
![Page 10: Learning Links in MeSH Co-occurrence Network - Preliminary ... · 7/15 MeSH Terms in an Article PMID- 20091016 TI - Chi-square-based scoring function for... AB - OBJECTIVES: Text](https://reader036.fdocuments.in/reader036/viewer/2022070902/5f5af0143e789f430f4fdd5f/html5/thumbnails/10.jpg)
10/15
Data Collection• We constructed two networks:
• Train network [2003-2007]
• Test network [2008-2012]
• Networks were post-processed to remove non-informativeedges.
• We applied χ2 test for independence for each co-occurrencepair to obtain statistic, which indicates whether particular pairoccurs together more often than by chance.
![Page 11: Learning Links in MeSH Co-occurrence Network - Preliminary ... · 7/15 MeSH Terms in an Article PMID- 20091016 TI - Chi-square-based scoring function for... AB - OBJECTIVES: Text](https://reader036.fdocuments.in/reader036/viewer/2022070902/5f5af0143e789f430f4fdd5f/html5/thumbnails/11.jpg)
11/15
Similarity Measures• For each node pair (u, v) we calculate similarity score s(u, v).
• Score s(u, v) gives the likelihood of link formation betweennodes u and v .
• We used two similarity measures:
• Jaccard coefficient
suv =|Γ(u) ∩ Γ(v)||Γ(u) ∪ Γ(v)|
where Γ(u) is set of neighbors of u
• Adamic-Adar coefficient
suv =∑
z∈Γ(u)∩Γ(v)
1log |Γ(z)|
![Page 12: Learning Links in MeSH Co-occurrence Network - Preliminary ... · 7/15 MeSH Terms in an Article PMID- 20091016 TI - Chi-square-based scoring function for... AB - OBJECTIVES: Text](https://reader036.fdocuments.in/reader036/viewer/2022070902/5f5af0143e789f430f4fdd5f/html5/thumbnails/12.jpg)
12/15
Performance Assessment• Major challenge is huge number of possible node pairs.
• We use a bootstrap resampling approach:
• We draw a random sample of 1000 nodes and createappropriate train and test networks.
• We compute link prediction score s(u, v) for each node pairthat is not associated with any interaction before time t3.
• We assign class label “positive” to this node pair if the linkoccurs in test network and “negative” otherwise.
• We repeat this procedure 100 times.
• Using class labels and similarity scores we constructed ROCcurve.
![Page 13: Learning Links in MeSH Co-occurrence Network - Preliminary ... · 7/15 MeSH Terms in an Article PMID- 20091016 TI - Chi-square-based scoring function for... AB - OBJECTIVES: Text](https://reader036.fdocuments.in/reader036/viewer/2022070902/5f5af0143e789f430f4fdd5f/html5/thumbnails/13.jpg)
13/15
ResultsTopological Characteristics of the MeSH Networks
Parameter Train Test
Nodes 24 225 25 570Edges 4 897 380 5 615 965Edges (reduced) 3 328 288 3 810 535Density 0.01 0.01Mean degree 274.78 298.05Average path length 2.23 2.20Clustering coefficient 0.27 0.26Small-worldness index 21.57 20.70
![Page 14: Learning Links in MeSH Co-occurrence Network - Preliminary ... · 7/15 MeSH Terms in an Article PMID- 20091016 TI - Chi-square-based scoring function for... AB - OBJECTIVES: Text](https://reader036.fdocuments.in/reader036/viewer/2022070902/5f5af0143e789f430f4fdd5f/html5/thumbnails/14.jpg)
14/15
Prediction Performance
Jaccard
False positive rate
Ave
rage
true
pos
itive
rat
e
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
●●●●●●●●●●●●
●●●●
●
●●●
●●
●
●●●●
●
●
●
●
●●●●
● ●●● ●●
● ●●●●●●
●●●
●
●
●
●
●●
●●
●●
●
AUC = 0.78
Adamic−Adar
False positive rateA
vera
ge tr
ue p
ositi
ve r
ate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
●●●●●●●●●
●●
●●●
●●●
●●
●●
●
●
●
●
●●
●●
●●
●●
AUC = 0.82
AUC (Area under the ROC curve): 0.90 – 1.00 = excellent, 0.80 – 0.90 =good, 0.70 – 0.80 = fair, 0.60 – 0.70 = poor, 0.50 – 0.60 = fail
![Page 15: Learning Links in MeSH Co-occurrence Network - Preliminary ... · 7/15 MeSH Terms in an Article PMID- 20091016 TI - Chi-square-based scoring function for... AB - OBJECTIVES: Text](https://reader036.fdocuments.in/reader036/viewer/2022070902/5f5af0143e789f430f4fdd5f/html5/thumbnails/15.jpg)
15/15
Future Work• Explore the role of node and edge attributes in predictionperformance.
• Extend the study to semantic relations instead ofco-occurrences.
• Assess prediction performance on large-scale network.
• Develop web application for real-time computing.