Post on 12-Jan-2016
1
MEASUREMENT AND ANALYSIS OF ONLINE SOCIAL NETWORKS
Professor : Dr Sheykh Esmaili
Presenters: Pourya Aliabadi
Boshra ArdallaniParia Rakhshani
2
INTRODUCTION The Internet has spawned different types of
information sharing systems, including the Web.
MySpace (over 190 million users) Orkut (over 62 million) LinkedIn (over 11 million) LiveJournal (over 5.5 million) Unlike the Web, which is largely organized
around content, online social networks are organized around users.
3
INTRODUCTION(CONT.)
Users join a network, publish their profile and create links to any other users with whom they associate.
The resulting social network provides a basis for maintaining social relationships, for finding users with similar interests, and for locating content.
Understanding of the graph structure of online social networks is necessary to evaluate current systems, to design future online social network based systems.
4
INTRODUCTION(CONT.)
Recent work has proposed the use of social networks to mitigate email spam, to improve Internet search, and to defend against Sybil attacks.
We obtained our data by crawling publicly accessible information on these sites
5
INTRODUCTION(CONT.)
This differs from content graphs like the graph formed by Web hyperlinks, where the popular pages (authorities) and the pages with many references (hubs) are distinct.
We find that online social networks contain a large, strongly connected core of high-degree nodes, surrounded by many small clusters of low-degree nodes.
Flow of information in these networks.
6
ONLINE SOCIAL NETWORKING SITES
Online social networking sites. are usually run by individual corporations.
Users. must register with a site, possibly under a pseudonym. Some sites allow browsing of public data without explicit.
Links. The social network is composed of user accounts and links between users. Some sites (e.g. Flickr, LiveJournal) allow users to link to any other user, without consent from the link target.
7
ONLINE SOCIAL NETWORKING SITES(CONT.)
Groups. Most sites enable users to create and join special interest groups.
Users can post messages to groups and upload shared content to the group. Certain groups are moderated. admission to such a group and postings to a group are controlled by a user designated as the group’s moderator.
Other groups are unrestricted, allowing any member to join and post messages or content.
8
IS THE SOCIAL NETWORK USED IN LOCATING CONTENT?
Only Orkut is a “pure” social networking site, in the sense that the primary purpose of the site is finding and connecting to new users.
Flickr, YouTube, and LiveJournal are used for sharing photographs, videos, and blogs, respectively.
9
WHY STUDY SOCIAL NETWORKS? Are already at the heart of some very popular
Web sites. Play an important role in future personal and
commercial online interaction. Help us understand the impact of online
social networks on the future Internet. We speculate how our data might be of
interest to researchers in other disciplines.
10
SHARED INTEREST AND TRUST
Adjacent users in a social network tend to trust each other.
A number of research systems have been proposed to exploit this trust.
Adjacent users in a social network also tend to have common interests.
Users browse neighboring regions of their social network because they are likely to find content that is of interest to them.
11
IMPACT ON FUTURE INTERNET
Impact on future Internet Impact on other disciplines Sociologists can examine our data to test
existing theories Studying the structure of online social
networks may help improve the understanding of online campaigning and viral marketing.
Political campaigns have realized the importance of blogs in elections.
12
HOW TO GET DATASETS? Sites reluctant to give out data
Cannot enumerate user list Performed crawls of user graph
Crawled using cluster of 58 machines Used APIs where available Otherwise, used HTML screen scraping
13
CHALLENGES IN CRAWLING LARGE GRAPHS Need to crawl quickly
Underlying social networks changing rapidly
Need to crawl completely Social networks aren’t necessarily connected,
some users have no links, or small clusters Need to estimate the crawl coverage
14
HOW TO VERIFY SAMPLES
Obtain a random user sample
Conduct a crawl using these random users as seeds
See if these random nodes connect to the original WCC (weakly connected component)
15
DATASET FROM FLICKR
Used API to conduct the crawl
Obtained random users by guessing usernames (########@N00) to evaluate coverage
Covered 27% of user population, but remaining users have very few links
16
DATASET FROM LIVEJOURNAL
Used API to conduct the crawl
Obtained random users using special URL http://www.livejournal.com/random.bml
Crawl covered 95% of user population
17
DATASET FROM ORKUT
Used HTML screen-scraping to conduct the crawl
18
DATASET FROM YOUTUBE
Used API to conduct the crawl
Could not obtain random users Usernames user-specified strings
Unable to estimate fraction of users covered
19
HIGH-LEVEL DATA CHARACTERISTICS
Metrics vary by orders of magnitude However, networks share many key
properties
Flickr LiveJournal Orkut YouTube
Number of Users 1.8M 5.3M 3.1M 1.2M
Avg. friends per User 12.2 17.0 106 4.3
Number of User Groups 0.1M 7.5M 8.7M 30K
Avg. Group per User 4.6 21.2 106 0.25
19
20
ANALYSIS OF NETWORK STRUCTURE
Characterize the structural properties of the four network and compare them Link symetry Power-law node degrees Correlation of indegree and outdegree Path lengths and diameter Link degree correlations Densely connected core Tightly clustered fringe
20
21
HOW ARE THE LINKS DISTRIBUTED?
Distribution of indegree and outdegree is similar● Underlying cause is link symmetry
22
LINK SYMMETRY
YouTube Flickr LiveJournal Orkut
symmetry 79.1% 62.0% 73.5% 100.0%
Possibly contributed by informing users of new incoming links
Unlike other complex networks, such as the Web● Sites like cnn.com receive much links more than they
give
makes it harder to identify reputable sources
22
23
POWER-LAW NODE DEGREES
All social networks show properties consistent with power-law networks.●The majority of the nodes have small degree,
and a few nodes have significantly higher degree
CORRELATION OF INDEGREE AND OUTDEGREE
outdegree vs. indegree in web outdegree vs. indegree in social networks
24
PW CNN
OSN
25
PATH LENGTHS AND DIAMETER
all four networks have short path length from 4.25 – 5.88
network Avg.path len diameter
web 16.12 905
flickr 5.67 27
livejournal 5.88 20
orkat 4.25 9
youtube 5.10 21
25
LINK DEGREE CORRELATIONS
Examine which users tend to connect to each other Focus on:
Joint degree distribution How often nodes of different degrees connect to each
other Scale free behavior
A value calculated directly from the joint degree distribution of graph
Assortativity A measure of the likelihood for nodes to connect to
other nodes with similar degrees
26
27
JOINT DEGREE DISTRIBUTION AND SCALE-FREE BEHAVIOUR
28
DENSELY CONNECTED CORE comprising of between 1% and 10% of the highest degree
nodes removing 10% of core nodes results in breaking up graph
into millions of very small SCCs graphs below show results as nodes are removed starting
with highest-degree nodes (left) and path length as graph is constructed beginning with highest-degree nodes(right)
Sub logarithmic growth28
29
CLUSTERING COEFFICIENT
Clustering coefficient C is a metric of cliquishness
Online social networks are tightly clustered 10,000 times more clustered than random graphs 5-50 times more clustered than random power-law
graphs
Number of links between friends
Number of links that could existC
30
TIGHTLY CLUSTERED FRINGE
Low-degree users show high degree of clustering Social network graphs show stronger clustering
31
GROUPS
Network Groups Usage Avg. Size Avg. C
Flickr 103M 21% 82 0.47
LiveJournal 7M 61% 15 0.81
Orkut 9M 13% 37 0.52
YouTube 30M 8% 10 0.34
31
32
WHAT DOES THE STRUCTURE LOOK LIKE
o the networks contain a densely connected core of high-degree nodes;
o and that this core links small groups of strongly clustered, low-degree nodes at the fringes of the network.
octopus
33
CONCLUSIONS
Structure of OSNs is significantly different from the Web Higher degree symmetry in OSNs Much higher levels of local clustering in OSNs Privacy controls make graph crawling very difficult Pure social networks different from content sharing networks
34
Thanks