Online Social Networks Thomas Karagiannis Microsoft Research.

62
Online Social Networks Thomas Karagiannis Microsoft Research
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of Online Social Networks Thomas Karagiannis Microsoft Research.

Online Social Networks

Thomas KaragiannisMicrosoft Research

How many people in the room have a profile in an Online Social Network

(OSN)?

Real life…

...and the networking community

Network geometry and design, Inference of network properties, Multihoming and overlays, Wireless, Secure networks, Troubleshooting, Congestion control, Router design, DNS

Multicast and Anycast, Control mechanisms, WWW, Performance analysis, Routing, TCP, Tracing and Measurement, Header Processing

Routing, Security, Data Center Networking, Management, Wireless, Router Primitives, Incentives, Measurement, P2P

Social networking services• Social communities

– Bebo, MySpace, Facebook, etc.• Content sharing

– YouTube, Flickr, MSN Soapbox, etc.• Corporate

– LinkedIn, Plaxo, etc.• Portals

– MSN, Yahoo 360, etc.• Recommendation engines

– Last.fm, StumbleUpon, Digg, Me.dium, etc.• Bookmarking/Tagging

– Del.icio.us , CiteUlike, Furl, etc.• Discussion groups

– Blogs, forums, chat, messaging, Live QnA, etc.• Mobile social networks

– Vipera, Nokia “MOSH”, etc.• Virtual worlds

– Second life

Social Network Sites: History [Boyd et al., 2007]

SixDegrees.com the first recognizable OSN Profiles and lists of friends Combined existing features! Failed - Nothing to do after accepting friend

requests. OSN wave after 2001 Friendster:

Technical and social difficulties with scale! “Fakesters” diluted the community

MySpace: Capitalized on Friendster’s problems Bands and fans Allowed personalization of profiles

Facebook: Growth: Harvard-only => University-only => high

schools & professionals => everyone Introduced applications (provided APIs)

Social networking services

Source: Bebo, Social Media – ‘getting your message across’

Shift in online communities OSNs are organized around people “Egocentric” networks

WEB: world composed of groups OSNs: world composed of networks

What do social networks enable?

Leveraging the “community” in traditional applications

• Content/information sharing• Search• Information management• Recommendations• Advertisements

Research topics of interest• Identification of communities and their evolution in time

• Measurement and analysis of online communities• Social media analysis: blogs and friendship networks

• Recommendation / collaborative filtering systems• Rating, review, reputation, and trust systems• Expertise / interest tracking

• Information sharing and forwarding• Search strategies in social networks• Viral marketing strategies

• Implications on network and distributed systems design• System design for social networks• Mobile social networks

• Privacy and anonymity

This lecture

• Social networks— Sociological studies & basic concepts

— Small worlds, weak ties, degrees, centralities

• Analysis and measurements of OSNs— Structure and properties— Impact of OSNs on traditional applications and user activity

— Information dissemination, viral marketing, privacy, tagging

Networks..

• …an interconnected system• …a series of points or nodes interconnected by

communication paths• … a collection of computers connected to each other

..and networks

• …relations, social structure among a set of actors (i.e., individuals)

• …nodes (which are generally individuals or organizations) that are tied by one or more specific types of interdependency, such as values, visions, ideas, financial exchange, friendship

Sociological studies

• How are groups of people connected?— To what degree does every member of a given group know

every other member?— Six degrees of separation and the small world phenomenon

• How many people do you know?— Ego networks

• Communities and interactions— Zachary’s karate club

• The strength of weak ties— Bridges and structural holes

Six Degrees of Separation

• Arbitrary “starting persons” were selected to forward a letter to a first-name acquaintance with the final goal of reaching an “arbitrary” target person– Target: Stockbroker in Boston, MA.– Starters:

• Random sample (n=100) of Boston residents• Random sample (n=96) from all Nebraska residents• Sample (n=100) of share-owning Nebraska residents

[Milgram 1967]

• How are groups of people connected?

Six Degrees of Separation

• 64 / 296 reached the target• Forwarding by exploiting targets’ address: 6.1• Forwarding by exploiting targets’ job: 4.6• Chains overlap as they converge on the target

– Only 26 individuals in the last hop– 16 copies delivered from one person alone

Six hops on the average to reach the target!

• Incomplete chains• Chances of forwarding increases

with number of intermediaries

How many people do you know?

Acquaintances ~ 5,000

Immediate contacts ~ 100-200

Regular contacts ~ 20 per week

Confidants ~ 3

ego

Ego networksconsist of a focal node ("ego") and the nodes to whom ego is directly connected to ("alters") plus the ties

[Ithiel de Sola Pool 1978][Freeman and Thomson 1989][Heran 1988]

Communities and interactions

• “Friendship” network between karate club students

• During the study, a dispute arose and the club split in two

• Split was the minimum cut!

[Zachary 1977]

Bridges and the strength of weak ties

• Social relationships are of varying “strength”– Duration, emotional intensity, intimacy, exchange of

services (backscratching)

• Strength of ties reveal different social processes– Strong ties tend to form cliques

[Granovetter 1973]

Bridges and the strength of weak ties

• Weak ties “bridge strongly” connected components

• Weak ties enable the sharing of information• Weak ties are related to “structural holes” [Burt 1992]

– Separation between non-redundant contacts– Efficiency of ego’s network (i.e., social capital) inversely

proportional to the redundancy in the network

[Granovetter 1973]

Bridge

Centralities

• Centrally positioned nodes are “privileged”– Hubs where power concentrates

• Different viewpoints: [Freeman 1979]

– Degree centrality– Closeness centrality– Betweenness centrality

Degree centrality

• Centrality according to the number of connections– Degree: Number of direct links

• For vertex u:

• For a graph G(V,E):

– C = 1 a node dominates– C = 0 all nodes equal centrality

1

8 2

3 47 6

59 10

• Degree centrality only measures number of connections – Nodes 2,3,4,1 are equivalent

• Closeness centrality refers to the closeness of a node to all other network members– Node 1 is less hops away to peripheral nodes

Closeness centrality

1

8 2

3 47 6

5

9 10

• Closeness is the mean geodesic distance (i.e., shortest path) of u to all other vertices

• For vertex u:

– As closeness increases, an individual’s access to information, power, prestige, etc. increases. [Leavitt 1951, Coleman 1973, Burt 1982]

• For a graph G(V,E):

Closeness centrality

Betweenness centrality• Betweeness measures the individual’s

intermediary value to all members of a network– Reflects the number of geodesics

through a node– Stricter measure of centrality

• Number of geodesics through i:

• For vertex u:

• For a graph G(V,E):

1

8 2

3 47 6

5

9 10

The meaning of centralities• Degree centrality:– Capacity to develop communication within a

network

• Closeness and betweenness centrality:– Capacity to control communication in a network– Closeness less accurate

• Strong closeness or betweeness:– Minority of actors control communications

• Centralities do not account for the volume of communication– Flow betweenness

This lecture

• Social networks— Sociological studies & basic concepts

— Small worlds, weak ties, degrees, centralities

• Analysis and measurements of OSNs— Structure and properties— Impact of OSNs on traditional applications and user activity

— Information dissemination, viral marketing, privacy, tagging

Measurement of Online Social Networks

• Crawled of several online social networks– Flickr: photo sharing– LiveJournal: blogging site– Orkut: social networking site– YouTube: video sharing

[Mislove et al, IMC-2007]

Measurement of Online Social Networks- Degree Distributions

[Mislove et al, IMC 2007]

Measurement of Online Social Networks- Degree Distributions

[Leskovec et al, WWW 2008]

180M nodes 1.3 B edges

Corporate email social networks and degree distributions

• Email exchanges form a social graph– Corporate email graphs of particular interest– Problem: What constitutes an edge?

• Studies:– HP Labs : 430 individuals, 6 emails as a threshold,

3 months [Adamic et al, Social Networks-2005]

– Microsoft : 150K employess, varying thresholds, 3 months [Karagiannis et al, MSR-TR 2008]

Corporate email social networks and degree distributions

Distribution appears exponential!

Structure of the graph directly affects its searchability

Biasing towards high-degree nodes may not be as efficient in enterprise email graphs

[Adamic et al, 2005]

Link symmetry

Measurement of Online Social Networks

[Mislove et al, IMC 2007]

Why?

Small world and six degrees revisited

Eccentricity is the maximum shortest path for a vertex Radius:

Minimum eccentricity of any vertex Diameter:

Maximum eccentricity of any vertex

Strength of ties

• Impact of strong ties–What happens to the social graph when

strong/weak ties are removed?–What is a strong tie?

• Examine the size of the largest connected component when certain nodes are removed

Strength of ties

Different viewpoints:

Strength of ties

Giant component shrinks gradually

Overlapping communities Bridges unlikely

Shortest path does increase Weak ties = shortcuts

AOL IM Friend Lists• Strength defined as participation in triangles

BuddyZoo

[Shi et al, Physica-2007]

Sociability and number of friendsGuestbook activity network• 2 years worth of data• How do activity graphs compare with friendship graphs?• How does friendship affect sociability?

[Chun et al, IMC-2008]

Sociability and number of friends

Capacity cap [Chun et al, IMC-2008]

Node strength (sociability) increases with the number of friends up to a limit

Is 200 a capacity cap? Authors argue that the limit could be

connected to Dunbar’s number Dunbar (1998): Limit of

manageable relationships is 150

Node strength: Sum of messages across all direct edges

Online marketplaces and social networks

Hypothesis: Transactions with friends will have higher satisfaction

• Overstock Auctions– Similar to eBay– Incorporates social components

• Friends, ratings, message boards

• Two networks– Personal: connecting friends – Business: based on transactions

[Swamynathan et al, WOSN-2008]

Online marketplaces and social networks

Business network has lesser degree

50% of users have less than 10 friends or transaction partners

82% users have less than 1% overlap between the two networks

Online marketplaces and social networks

17K transactions studied Only 22% are between partners

connected in the social network High success rate:

~80% for paths up to six hops

Satisfaction does not hold at long distances in the partner network

Expected (?)

Viral marketing and social networks

[Lerman et al,WOSN-2008]

Hypothesis: Social interactions may be exploited to promote content

• User-submitted news stories• Digg promotes stories to the

front page

• Allows social networking:– Friends vs. fans

• B is A’s friend if A is watching B

• B is A’s fan if B is watching A

Viral marketing and social networks Patterns of vote diffusion? Predict story popularity?

In-network votes From fans of previous

voters

Viral marketing and social networks

Data by scraping Digg: 900 newly submitted stories (2006) 200 front page stories Time-ordered votes, user ids, etc

Large number of early in-network votes is negatively correlated with the eventual popularity of the story

Intuition: If a story is truly interesting, it will be discovered by “independent” individuals

Viral marketing and social networks

Cascades in social networks

[Cha et al, WOSN-2008]

How do photo bookmarks spread through social links?

• Crawled Flickr– 2.5M users, 33M friend links, 100 days– 34M bookmarks (11m distinct photos)

• Methodology: Did a particular bookmark spread through social links?– No: if a user bookmarks a photo and if none of his friends have

previously bookmarked the photo– Yes: if a user bookmarks a photo a&er one of his friends

bookmarked the photo

Cascades in social networks

• Hypothesis: Photos propagate like diseases through human contacts

• Model:

– k: node degree, σ0 :adoption rate

– –

• Known R0 : HIV (2-5), Measles (12-18)

Cascades in social networks

Cascades in social networks

Finding: Model can describe photo

propagation Potential use:

Predicting popularity

Privacy in social networks

[Krishnamurthy et al, WOSN-2008]

• Users are encouraged to share personal information– Most users unaware– External applications require users to grant access to

personal info

Privacy in social networks

Finding: Strong negative correlation

between network size and viewable profile and friend lists

Users more sensitive about their profiles

Privacy in social networks

Finding: Information leaks to third-

parties as for Web

Privacy in social networks

[Guha et al, WOSN-2008]

• How do you ensure the “social network” experience and keep your data private?– NOYB (None of your business)

• Ensuring trust– Do you trust your OSN provider?– If yes, who else can see your data?

• Main idea:– Profiles are composed of multiple fields– If separated, these fields do not mean much

Privacy in social networks

Privacy in social networks

Privacy in social networks

Not a long term strategy!

Ranking and suggested candidate items

• Collaborative information tagging[Vojnovic et al, 2008]

• How to suggest tags?– Goal: Learn true ranking popularity– Tags could be used for information

retrieval

• Problem:– Users tend to imitate!

Ranking and suggested candidate items

Summary• Degrees in OSNs

– Power law distributions– Exponential distributions in corporate email graphs

• Small world phenomenon– Present in OSNs (short paths/diameters)– Average shortest path close to 6

• Weak ties– Networks robust to removal of weak ties

• Findings:– Capacity cap of 200– Significant symmetry of links– Marketplaces: Social links not exploited but their usage appears promising– Digg: “In-network” votes negatively correlate with story popularity– Flickr: Photo bookmarks propagate similarly to diseases– Privacy: Concerns correlate with network size– Tagging: Users imitate biasing rankings

Thank you!