Hyperlink Classification via Structured Graph...

21
Hyperlink Classification via Structured Graph Embedding 1 , 2 , and β„Ž βˆ—1 1 (), 2 , *corresponding author The 42 nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019 1

Transcript of Hyperlink Classification via Structured Graph...

Page 1: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Hyperlink Classification via Structured Graph Embedding

πΊπ‘’π‘œπ‘› 𝐿𝑒𝑒1, π‘†π‘’π‘œπ‘›π‘”π‘”π‘œπ‘œ πΎπ‘Žπ‘›π‘”2, and π½π‘œπ‘¦π‘π‘’ π½π‘–π‘¦π‘œπ‘’π‘›π‘”π‘Šβ„Žπ‘Žπ‘›π‘”βˆ—1

1π‘†π‘’π‘›π‘”π‘˜π‘¦π‘’π‘›π‘˜π‘€π‘Žπ‘› π‘ˆπ‘›π‘–π‘£π‘’π‘Ÿπ‘ π‘–π‘‘π‘¦ (π‘†πΎπΎπ‘ˆ), 2π‘π‘Žπ‘£π‘’π‘Ÿ πΆπ‘œπ‘Ÿπ‘π‘œπ‘Ÿπ‘Žπ‘‘π‘–π‘œπ‘›, *corresponding author

The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

1

Page 2: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Contents

1. Outline

2. Datasets and Challenges

3. Knowledge Graph Embedding

4. Hyperlink Classification Model

5. Result

6. Conclusion and Future Work

2

Page 3: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Outline

Three types of hyperlinks:

Navigation Link

Links related to navigation within a site, such as site path or site recommendation.

Suggestion Link

Links recommended directly by the user using the site.

Action Link

Links that require other user actions.

3

Page 4: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Outline

Navigation Link

4

Page 5: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Outline

Suggestion Link

5

Page 6: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Outline

Action Link

6

Page 7: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Conduct a biased random walk from the seed page. (By NAVER)

7

Datasets and Challenges

Seed Page

Page 8: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Datasets

8

Datasets and Challenges

1284 (89%)

93 (6%)

65 (5%)

Navigation Suggestion Action

9892 (99%)

85 (1%)23 (0%)

Navigation Suggestion Action

268 (61%)

112 (26%)

57 (13%)

Navigation Suggestion Action

graph_437 graph_1442 graph_10000

graph_437 graph_1442 Graph_10000

No. of Hyperlinks 437 1442 10000

No. of Pages 404 332 2202

No. of Domain 18 100 120

No. of Host - 22 25

Page 9: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Knowledge Graph Embedding

Can the knowledge graph embedding method be applied to hyperlink embedding?

9

NAVI

NAVI

NAVI

NAVI

SUGG

ACTION

Knowledge Graph Web Graph

Knowledge graph image from https://medium.com/comet-ml/using-fasttext-and-comet-ml-to-classify-relationships-in-knowledge-graphs-e73d27b40d67

Page 10: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Knowledge Graph Embedding

Can the knowledge graph embedding method be applied to hyperlink embedding?

10

(Bob, is_interested_in, The_Mona_Lisa)

(Bob, _is_a_friend_of, Alice)

(Barack Obama, _place_of_birth, Hawaii)

(Albert Einstein, _follows_diet, Veganism)

(San Francisco, _contains, Telegraph Hill)

(www.naver.com, NAVI, www.news.naver.com)

(www.google.com/larry_page, SUGG, www.wikipedia.org/Larry_Page)

(www.wikipedia.org/Larry_Page, ACTION, www.wikipedia.org/Larry_Page/edit)

Page 11: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Knowledge Graph Embedding

TransE : Translating Embeddings for Modeling Multi-relational Data (NIPS 2013)

TransH : Knowledge Graph Embedding by Translating on Hyperplanes (AAAI 2014)

TransR : Learning Entity and Relation Embeddings for Knowledge Graph Completion (AAAI 2015)

11Image from β€œKnowledge graph embedding: A survey of approaches and applications.”, TKDE 2017.

Page 12: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Knowledge Graph Embedding

TransE, TransH, and TransR methods minimize the following loss function:

L =

(h,r,t)∈S

(hβ€²,r,t’)∈S’

𝑓 h, r, t + 𝛾 βˆ’ 𝑓(hβ€², r, t’) +

where x + ≑ max(0, x) and 𝛾 is the margin.

S is a set of golden triplets and S’ is a set of corrupted triplets.

The method for calculating 𝑓(h, r, t) depends on the model.

TransE : 𝑓 h, r, t = h + r βˆ’ t 22

TransH : 𝑓 h, r, t = hβŠ₯ + r βˆ’ tβŠ₯ 22

TransR : 𝑓(h, r, t) = hr + r βˆ’ tr 22

eβŠ₯ and er represent projected entity e on relation-specific hyperplane and space, respectively.

12

Page 13: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Hyperlink Classification Model

Given two web pages 𝑃𝑒 and 𝑃𝑣, and set of hyperlink types R, estimate the type of hyperlink

between 𝑃𝑒 and 𝑃𝑣 as follow.

π‘Ÿβˆ— = π’‚π’“π’ˆπ¦π’π§π‘ŸβˆˆR

𝑓(𝑷𝒖, π‘Ÿ, 𝑷𝒗)

where 𝑓(β„Ž, π‘Ÿ, 𝑑) is the loss of triplet (β„Ž, π‘Ÿ, 𝑑).

rβˆ— is the predicted relation.

13

Page 14: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Hyperlink Classification Model

Negative Sampling

TransE / TransH / TransR β†’ only changes entities when making negative samples

(β„Ž, π‘Ÿ, 𝑑) β†’ (β„Žβ€², π‘Ÿ, 𝑑) or (β„Ž, π‘Ÿ, 𝑑’)

Introduce parameter 𝛼 (0 ≀ 𝛼 ≀ 1).

β€’ Replace the entity with the probability 𝛼. (β„Ž, π‘Ÿ, 𝑑) β†’ (β„Žβ€², π‘Ÿ, 𝑑) or (β„Ž, π‘Ÿ, 𝑑’)

β€’ Replace the relation with the probability 1 βˆ’ 𝛼. (β„Ž, π‘Ÿ, 𝑑) β†’ (β„Ž, π‘Ÿβ€², 𝑑)

14

Page 15: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Hyperlink Classification Model

Negative Sampling

False Negative

When we corrupt entities, there is a chance that it is not a corrupted one but just an unobserved

one in the training set.

15

(𝑝1, 𝑛, 𝑝2)

Corrupt entity (𝑝1, 𝑛, 𝑝3) : may exist in valid or test set β†’ risky!

(𝑝1, π‘Ÿ, 𝑝2) : guaranteed that it should not hold β†’ safe!Corrupt relation

Page 16: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Result

The average F1 scores(%) of our model with different 𝜢 and the original TransE, TransH, and TransR.

16

Page 17: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Result

F1 Scores(%) of each relation and the average F1 scores.

17

Page 18: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Result

The average F1, precision, recall on the three web graphs.

β€’ Random-predict indicates the performance of random prediction while preserving the number of

hyperlinks in each class.

β€’ We use the result of TransH with 𝛼 = 0.3, 𝛼 = 0.5, and 𝛼 = 0.3 for web_437, web_1442, and web_10000,

respectively.

18

Page 19: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Result

Performance on the original web graphs and randomly shuffled graphs where the relation labels are

randomly assigned while preserving the number of each relation type.

The real-world web graphs have characterized structures, which enables the knowledge graph embedding

techniques to reasonably work well on predicting the relation labels.

19

Page 20: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Conclusion and Future Work

Conclusion

β€’ The knowledge graph embedding techniques can be efficiently used for hyperlink embedding.

β€’ Performance improvement can be observed by modifying the negative sampling method.

Future Work

β€’ Utilize information from web pages or hyperlinks (such as anchor text) in hyperlink embeddings.

20

Page 21: Hyperlink Classification via Structured Graph Embeddingbigdata.cs.skku.edu/down/SIGIR2019_slides.pdfΒ Β· Knowledge Graph Embedding TransE: Translating Embeddings for Modeling Multi-relational

Thank You

21