SDOW (ISWC2011)
-
Upload
claudia-wagner -
Category
Education
-
view
104 -
download
0
description
Transcript of SDOW (ISWC2011)
DIGITAL Institute for Information and Communication Technologies
Pragmatic metadata matters:How data about the usage of data affects
semantic user modelsClaudia Wagner, Markus Strohmaier, Yulan He
Sunday, October 23, 2011
2
ExampleSemantic Metadata
sioc:UserAccount
rdf:type
sioc:name
sioc:Post
rdf:type
sioc:content
sioc:has_creator
foaf:Personsioc:account_of
Sunday, October 23, 2011
3
ExamplePragmatic Metadata
Sunday, October 23, 2011
3
ExamplePragmatic Metadata
Sunday, October 23, 2011
3
ExamplePragmatic Metadata
Sunday, October 23, 2011
3
ExamplePragmatic Metadata
Sunday, October 23, 2011
3
ExamplePragmatic Metadata
Sunday, October 23, 2011
4
Aim
sioc:UserAccount
rdf:type
sioc:name
sioc:Post
rdf:type
sioc:topic
sioc:content
sioc:has_creator
foaf:Personsioc:account_of
?foaf:interest
?
Can pragmatic metadata support the generation of semantic metadata and if yes how?
Sunday, October 23, 2011
5
Experimental Setup§ Methodology
§ Topic Modeling Algorithms to learn topics (probability distributions of words) and annotate users and posts with topics
§ Incorporated different types of pragmatic metadata into the Topic Models
§ Compared different models via their predictive performance
§ Dataset§ Boards.ie§ Forums, Posts and Users§ User`s authoring and replying behavior
§ Training Dataset: First and last week of February 2006§ Test Dataset: 3 future posts of each user
Sunday, October 23, 2011
6
Evaluation
§ Compare different models by testing their predictive performance on held out posts.
§ Assumption: a better user topic model reacts less perplex on future posts authored by a user and needs less trainings samples.
Sum over all words in a user`s future post
Log Likelihood of a word of user`s future post given the model we learned
Sunday, October 23, 2011
7
MethodologyLDA
§ How to learn topics and annotate users with topics?
§
T1 T2 T3
T1:mac: 0.3iMac: 0.13PC: 0.03computer: 0.04....
Text
Latent Dirichlet Allocation (LDA) (Blei et al, 2003)
Sunday, October 23, 2011
8
MethodologyDMR
§ How to incorporate metadata into topic models?
§ Dirichlet Multinomial Regression (DMR) Topic Models (Mimno et al, 2008)
§ Observe feature vector x per document§ Draw „fresh“ alpha for each document which depends
on observed features x and the feature distribution per topic λt
∝dt= exp(λt Xdt)
Sunday, October 23, 2011
9
Methodology
ID Alg Doc Metadata
M1 LDA Post -M2 LDA User -
M3 DMR Post author
M4 DMR User author
M5 DMR Post reply-user
M6 DMR User reply-user
M7 DMR Post related-user
M8 DMR User related-user
authored Post 1
Post 2
Post 3
Post 4
Post 5
Post 6
replies toUser 1
User 2
authored
Post 7 Future
Past
Sunday, October 23, 2011
10
§ Different user activities performed on content
Baseline LDA (M1 and M2)
Post training scheme (M3, M5 and M7)
Models which take user replies into account.(M6 and M8)
Sunday, October 23, 2011
11
Results
ID Alg Doc Metadata
M1 LDA Post -M2 LDA User -
M3 DMR Post author
M4 DMR User author
M5 DMR Post reply-user
M6 DMR User reply-user
M7 DMR Post related-user
M8 DMR User related-user
authored Post 1
Post 2
Post 3
Post 4
Post 5
Post 6
replies toUser 1
User 2
authored
Post 7 Future
Past
Sunday, October 23, 2011
11
Results
ID Alg Doc Metadata
M1 LDA Post -M2 LDA User -
M3 DMR Post author
M4 DMR User author
M5 DMR Post reply-user
M6 DMR User reply-user
M7 DMR Post related-user
M8 DMR User related-user
authored Post 1
Post 2
Post 3
Post 4
Post 5
Post 6
replies toUser 1
User 2
authored
Post 7 Future
Past
Sunday, October 23, 2011
11
Results
ID Alg Doc Metadata
M1 LDA Post -M2 LDA User -
M3 DMR Post author
M4 DMR User author
M5 DMR Post reply-user
M6 DMR User reply-user
M7 DMR Post related-user
M8 DMR User related-user
authored Post 1
Post 2
Post 3
Post 4
Post 5
Post 6
replies toUser 1
User 2
authored
Post 7 Future
Past
Sunday, October 23, 2011
11
Results
ID Alg Doc Metadata
M1 LDA Post -M2 LDA User -
M3 DMR Post author
M4 DMR User author
M5 DMR Post reply-user
M6 DMR User reply-user
M7 DMR Post related-user
M8 DMR User related-user
authored Post 1
Post 2
Post 3
Post 4
Post 5
Post 6
replies toUser 1
User 2
authored
Post 7 Future
Past
Sunday, October 23, 2011
11
Results
ID Alg Doc Metadata
M1 LDA Post -M2 LDA User -
M3 DMR Post author
M4 DMR User author
M5 DMR Post reply-user
M6 DMR User reply-user
M7 DMR Post related-user
M8 DMR User related-user
authored Post 1
Post 2
Post 3
Post 4
Post 5
Post 6
replies toUser 1
User 2
authored
Post 7 Future
Past
Sunday, October 23, 2011
11
Results
ID Alg Doc Metadata
M1 LDA Post -M2 LDA User -
M3 DMR Post author
M4 DMR User author
M5 DMR Post reply-user
M6 DMR User reply-user
M7 DMR Post related-user
M8 DMR User related-user
authored Post 1
Post 2
Post 3
Post 4
Post 5
Post 6
replies toUser 1
User 2
authored
Post 7 Future
Past
Sunday, October 23, 2011
12
Results§ The topics of users who reply to a user are also likely for
this user§ Therefore, if 2 users get replies from the same users
than they are more likely to talk about the same topics
§ Topic models which incorporate pragmatic metadata per user can indeed improve models
§ Topic models which incorporate pragmatic metadata per post often over-fit data§ Model Assumptions are too strict!
§ Idea: Incorporate behavioral user similarities§ Intuition: users which are similar are more likely to talk
about the same topics§ How to measure behavioral similarity?
§ forum usage§ communication behavior
Sunday, October 23, 2011
13
Methodology
ID Alg Doc Metadata
M9 DMR Post top 10 forums
M10 DMR User top 10 forums
M11 DMR Posttop 10 communication partner
M12 DMR Usertop 10 communication partner
authored
Post 1
Post 2
Post 3User 1
authored Post 4
Post 5
Post 6User 2
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
f15f20 f31 f12 f5f6 f17 f18 f19 f10
Post 7 Future
Past
Sunday, October 23, 2011
14
Baseline LDA (M1 and M2)
Post training scheme (M3, M9 and M11)
User training scheme (M4, M10 and M12)
Models M12 incorporates user similari;es based on their communica;onbehavior
Sunday, October 23, 2011
15
Results
§ Topic models seem to benefit from taking behavioral user similarities into account
§ Users who behave similar (regarding their forum usage and communication behavior) are likely to talk about the same topics
§ Common communication-partner seem to be more predictive for common topics than common forums
Sunday, October 23, 2011
16
Conclusions§ Pragmatic metadata may help to learn better semantic
user models
§ But pragmatic metadata observed on a post level often over-fits data
§ Pragmatic Metadata on a user level seems to improve the predictive performance of topic models§ If posts of 2 users are “used” in a similar way then
they are more likely to talk about the same topics § If 2 users behave similar (tend to post to same forums
or tend to talk to same users) they are more likely to talk about same topics.
§ Common communication-partner seem to be more predictive for common topics than common forums
Sunday, October 23, 2011
17
Limitations and Future Work§ Perplexity and semantic interpretability of topics do not
necessarily correlate (Chang et al., 2009)§ Separate evaluation of semantic coherence of topics
§ Analyzing different types of behavior- and usage-related metadata and explore to what extent they may reveal information about the semantics of data§ behavior on social streams such as Twitter§ tagging behavior§ navigation behavior
Sunday, October 23, 2011
18
References
§ David M. Blei, Andrew Ng, Michael Jordan. Latent Dirichlet allocation. JMLR (3) (2003) pp. 993-1022
§ Chang, J., Boyd-graber, J., Gerrish, S., Wang, C. and Blei, D. Reading Tea Leaves: How Humans Interpret Topic Models, Neural Information Processing Systems, NIPS (2009)
§ Mimno, D.M. and McCallum, A. Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression. In Proceedings of UAI. (2008), pp. 411-418
Sunday, October 23, 2011