Growing Wikipedia Across Languages via
RecommendationJure Leskovec
StanfordEllery Wulczyn
WikimediaBob West
EPFLLeila Zia
Wikimedia
Wikipedia is the biggest encyclopedia in history and the 5th most visited website worldwide.
Given that you speak language X, how much can you learn about the world?
All Wikipedia languages 2M geo-located articles
3
Lausanne
Markus Krötzsch, TU Dresden
German Wikipedia 356K geo-located articles 132M native speakers
4Markus Krötzsch, TU Dresden
Spanish Wikipedia 261k Geo Tagged Items 389M Native speakers:
Spanish Wikipedia 261K geo-located articles 389M native speakers 5Markus Krötzsch, TU Dresden
https://ddll.inf.tu-dresden.de/web/Wikidata/Maps-06-2015/en
Arabic Wikipedia 87K geo-located articles 467M native speakers 6Markus Krötzsch, TU Dresden
Problem: Access to Knowledge
Depending on the language you speak, you can only learn about a small subset
of topics
How do we make access to all knowledge universal?
7
Problem: Access to Knowledge
Depending on the language you speak, you can only learn about a small subset
of topics
How do we democratize access to knowledge?
8
How do we grow language-specific Wikipedias faster?
Goal: Recommend important missing articles to human editorsSystem overview:
Wikipedia: Filling knowledge gaps
9
Find Rank Recommend
Bottomline: 3x growth rate with no loss in article quality
Find Rank Recommend
10
How do we identify missing articles in language X?
Task: Find Missing Articles
● Wikidata: Knowledge base for Wikipedia
● Every article in every language is mapped into a node in a concept graph
11
en:Apple sl:Jabolko
Q32
nl:Appel
Task: Find Missing Articles
Q32
English Wikipedia French Wikipedia
Apple Switzerland SuisseAfrique
Q45
12
ConceptsQ46
Wikipedia articles
Task: Find Missing Articles
Q32 Q46
English Wikipedia French Wikipedia
Apple
Q45
13
Concepts
Wikipedia articles
PommeSwitzerland SuisseAfrique
14
We want editors to create important missing articles. How can we rank articles by importance?
Find Rank Recommend
Ranking Missing Articles
Goal: Given a missing article, predict popularity rank in the target language (once it’s created)
Problem: The article doesn’t exist yet (so we don’t have its features for prediction)
15
Ranking Missing Articles
Approach:- Predict popularity of existing French articles- using features extracted from the
corresponding non-French articlesExample features: pageviewstopicslinks
16
Classification algorithm: Random forests
17
Identify interested human editors who can work on a given important missing article
Find Rank Recommend
Recommending Missing Articles
Q22
Q32
Q34
Q12
Top missing articles
Editors
E1
E2
Affinity(E1, Q34) 18
Bipartite matching between editors and missing items
Computing Affinities: Concept Embedding
Q99
Q67
19
Q67
Q34
Computing Affinities
Q99
Q67
Q34
20
Q67E1
Computing Affinities
Q99
Q67
Q34
21
Q67E1
ϴ
cos(ϴ) = Affinity(E1, Q34)
22
Identify missing concepts using Wikidata knowledge base
Predict language- specific popularity
Match missing articles to editors considering their interest
Method Summary
Find Rank Recommend
Experiments
Large-scale experiment targeting French Wikipedia EditorsSent 12K editors 5 recommendations each by email
Research questions: RQ1: Do recommendations boost article creation rate?RQ2: How much does personalization help?RQ3: How do recommendations affect article quality?
23
Large-Scale In Vivo Experiment
24
Articles EditorsCondition
Personalized recommendation
Random recommendations
No recommendations
Large-Scale In Vivo Experiment
24
Articles EditorsCondition
Personalized recommendation
Random recommendations
No recommendations
RQ1: Boost in Creation rate?
25
Yes! 3x boost in creation rate
Personalized No recs.
Creation rate 1.05% 0.32%
*** p < 0.00001
RQ2: Does personalizing help?
26
Yes! 2x boost in activity
Personalized Not personalized
# Editors 6K 6K
Activation rate 2.3% 1.1%
*** p < 0.00001
RQ3: Impact on article quality?
27
No loss in quality!
Deployment at Wikipedia
28
https://recommend.wmflabs.org
Agriculture in Farsi
29
Climate Change in Malagasy
30
ML in Simple English
31
Conclusions
32
Massive knowledge gaps in Wikipedia
System for recommending important missing articles
3x boost in article creation rate with no loss in article quality
Deployed at Wikipedia!
Thank you!
Top Related