Pula 5 Giugno 2007
-
Upload
andrea-capocci -
Category
Business
-
view
113 -
download
2
description
Transcript of Pula 5 Giugno 2007
![Page 1: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/1.jpg)
Complex networks intagging systems
Andrea Capocci
Dipartimento di Informatica e SistemisticaUniversità di Roma ”Sapienza”
![Page 2: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/2.jpg)
Tag networks
www.citeulike.org
Users save scientific publications and tag them with tags (keywords).
Other examples:
Flickr.com (photos)del.icio.us (bookmarks)Connotea.org, BibSonomy (papers)
![Page 3: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/3.jpg)
TAGS
![Page 4: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/4.jpg)
Tagging systems astripartite networks
Tag assignmentA tagging system is a set of tag assignments. A tag assignment is a triplet
(user, resource, tag)
CiteULike550k tag assignments48k distinct tags180k distinct papers6k distinct users
![Page 5: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/5.jpg)
![Page 6: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/6.jpg)
![Page 7: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/7.jpg)
Text analysis of tagging
The stream of tags can be interpreted as a text continuously written by collaborative users.
Zipf laws, preferential attachment and Yule processes in tags streams?
del.icio.us > Cattuto et al.
![Page 8: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/8.jpg)
Sub-linear vocabulary growth
internal time
# of tags
![Page 9: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/9.jpg)
del.icio.us > x0.8
![Page 10: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/10.jpg)
Tag frequency distribution
![Page 11: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/11.jpg)
Preferential attachment
![Page 12: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/12.jpg)
Few tags per resource
![Page 13: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/13.jpg)
Where is semantics?
Such properties can be modeled by Yule-Simon processes with memory (see Cattuto et al.)
But such analysis does not capture the semantics of tags: hierarchical relations etc.
![Page 14: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/14.jpg)
Why semantics matters?
Detection of tags categories.
Understanding users' strategies to improve the system, propose new services.
Spam detection.
![Page 15: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/15.jpg)
Why semantics matters?
Detection of tags categories.
Understanding users' strategies to improve the system, propose new services.
Spam detection.
![Page 16: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/16.jpg)
Why semantics matters?
Detection of tags categories.
Understanding users' strategies to improve the system, propose new services.
Spam detection.
![Page 17: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/17.jpg)
Why semantics matters?
Detection of tags categories.
Understanding users' strategies to improve the system, propose new services.
Spam detection.
![Page 18: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/18.jpg)
Tag co-occurrence network
Tags are nodes.
If two tags are assigned to the sameresource, one puts an edge between thetwo tags.
Edges are weighted: each co-assignmentof two tags increases the edge weight byone.
Strength instead of degree.
![Page 19: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/19.jpg)
Distribution of strength
![Page 20: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/20.jpg)
Distribution of strength
?
![Page 21: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/21.jpg)
Nontrivial clustering & spam detection
Clustering coefficient C(k) Average density of triangles around nodes with degree k
![Page 22: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/22.jpg)
Nontrivial clustering & spam detection
![Page 23: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/23.jpg)
Nontrivial clustering & spam detection
k = 502
![Page 24: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/24.jpg)
Looking for a k = 502 page...
![Page 25: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/25.jpg)
![Page 26: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/26.jpg)
SPAM
![Page 27: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/27.jpg)
Nontrivial clustering & spam detection
spamk = 502
![Page 28: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/28.jpg)
Co-occurrence networksand semantics
Co-occurrence networks are scale-free ones.
The significance of such statistical property is ambiguous.
Clustering encodes semantics (?)
Clustering can be used to detect spam.
![Page 29: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/29.jpg)
Co-occurrence networksand semantics
Co-occurrence networks are scale-free ones.
The significance of such statistical property is ambiguous.
Clustering encodes semantics (?)
Clustering can be used to detect spam.
![Page 30: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/30.jpg)
Co-occurrence networksand semantics
Co-occurrence networks are scale-free ones.
The significance of such statistical property is ambiguous.
Clustering encodes semantics (?)
Clustering can be used to detect spam.
![Page 31: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/31.jpg)
Co-occurrence networksand semantics
Co-occurrence networks are scale-free ones.
The significance of such statistical property is ambiguous.
Clustering encodes semantics (?)
Clustering can be used to detect spam.
![Page 32: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/32.jpg)
Co-occurrence networksand semantics
Co-occurrence networks are scale-free ones.
The significance of such statistical property is ambiguous.
Clustering encodes semantics (?)
Clustering can be used to detect spam.
![Page 33: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/33.jpg)
Users' strategies
Do users tag resources according to tag conceptual
hierarchy?
![Page 34: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/34.jpg)
For example
”Emergence of scaling in random networks”by A.-L. Barabasi and R. Albert
Semantics and hierarchy
![Page 35: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/35.jpg)
For example
”Emergence of scaling in random networks”by A.-L. Barabasi and R. Albert
scale-free networks
Semantics and hierarchy
![Page 36: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/36.jpg)
Semantics and hierarchyFor example
”Emergence of scaling in random networks”by A.-L. Barabasi and R. Albert
scale-free networks networks
HIERARCHICAL
![Page 37: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/37.jpg)
For example
”Emergence of scaling in random networks”by A.-L. Barabasi and R. Albert
scale-free networks WWW
NON HIERARCHICAL
Semantics and hierarchy
![Page 38: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/38.jpg)
Model based on hierarchy
Conjectures
1. Tags have an underlying hierarchy.2. With high probability, users add tags hierarchically.
Can we reproduce the co-occurrence network structure based on tag hierarchy?
![Page 39: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/39.jpg)
Model based on hierarchy
The underlying hierarchy is a random tree.
At each time step, we add a new resource, with two tags.
New tags are introduced with probability Pnt.
With probability Psb
, the second tag is a ”generalization” of the first tag, otherwise it is chosen randomly.
![Page 40: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/40.jpg)
Model based on hierarchy
The underlying hierarchy is a random tree.
At each time step, we add a new resource, with two tags.
New tags are introduced with probability Pnt.
With probability Psb
, the second tag is a ”generalization” of the first tag, otherwise it is chosen randomly.
![Page 41: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/41.jpg)
Model based on hierarchy
The underlying hierarchy is a random tree.
At each time step, we add a new resource, with two tags.
New tags are introduced with probability Pnt.
With probability Psb
, the second tag is a ”generalization” of the first tag, otherwise it is chosen randomly.
![Page 42: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/42.jpg)
Model based on hierarchy
The underlying hierarchy is a random tree.
At each time step, we add a new resource, with two tags.
New tags are introduced with probability Pnt.
With probability Psb
, the second tag is a ”generalization” of the first tag, otherwise it is chosen randomly.
![Page 43: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/43.jpg)
Results: strength distribution
![Page 44: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/44.jpg)
\\
Results: clustering
![Page 45: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/45.jpg)
Conclusions
Tagging systems display non trivial statistical properties: Zipf laws.
Co-occurrence networks are a way of discovering semantic relationship between tags (?)
Clustering in co-occurrence networks encodes semantics (?) and detects spam.
Simple models based on hierarchy partially explain such properties.
![Page 46: Pula 5 Giugno 2007](https://reader035.fdocuments.in/reader035/viewer/2022081414/54c6b7974a7959b72d8b45be/html5/thumbnails/46.jpg)
Thank youand thanks to...
Guido Caldarelli
The TAGORA group (Cattuto et al.)