Bridging the Gap Between Physical Location and Online Social Networks, at Ubicomp 2010
-
Upload
jason-hong -
Category
Spiritual
-
view
109 -
download
2
description
Transcript of Bridging the Gap Between Physical Location and Online Social Networks, at Ubicomp 2010
1
Bridging the Gap Between Physical Location and Online Social Networks
Justin Cranshaw Eran Toch
Jason Hong Aniket Kittur
Norman Sadeh
Carnegie Mellon UniversitySchool of Computer Science
2
On Facebook, we On Facebook, we maintain a set of social maintain a set of social connection we typically connection we typically call call Facebook friendsFacebook friends..
3
DDDD
CCCC
EEEE
BBBB
AAAA
On Facebook, we On Facebook, we maintain a set of social maintain a set of social connection we typically connection we typically call call Facebook friendsFacebook friends..
4
DDDD
CCCC
EEEE
BBBB
AAAA
There may be some There may be some people we know in real people we know in real life with whom we are life with whom we are not Facebook friends.not Facebook friends.
5
DDDD
CCCC
EEEE
BBBB
AAAASimilarly, we may have Similarly, we may have Facebook friends that we Facebook friends that we do not know in real life.do not know in real life.
6
AAAA
DDDD
CCCC
BBBB
EEEE
DDDD
CCCC
EEEE
BBBB
AAAA
7
AAAA
DDDD
CCCC
BBBB
EEEE
DDDD
CCCC
EEEE
BBBB
AAAA
8
AAAA
DDDD
CCCC
BBBB
EEEE
DDDD
CCCC
EEEE
BBBB
AAAA
9
AAAA
DDDD
CCCC
BBBB
EEEE
DDDD
CCCC
EEEE
BBBB
AAAA
10
AAAA
DDDD
CCCC
BBBB
EEEE
DDDD
CCCC
EEEE
BBBB
AAAA
11
The purpose of this work is to The purpose of this work is to explore the area between online explore the area between online social networks, and the real world social networks, and the real world mobility patterns of their users.mobility patterns of their users.
12
13
Outline:
Goal: Define a set of observable properties of physical places that convey information about the people that visit the location and social interactions that there.
Evaluation: We will evaluate these properties on a prediction task. We will attempting to discern Facebook friendships from non-friendships based on the co-location network of the users.
Results: We’ll show that using these location based features significantly improves the performance of a classifier.
14
Related Work:Several results affiliated with Sandy Pentland’s group
[Eagle & Pentland, 2009]
[Eagle, Pentland, and Lazer 2009]
Several results from Microsoft research:
[Zheng et. al, UbiComp, 2008]
[Zheng et al, GIS, 2008]
[Kostakos & Venkatanthan, 2010]
Our main point of difference in this work is our focus on contextual properties of the location histories.
15
Co-locationSuppose A and B are co-located. How might we deduce if they are actually friends?
1.1. We can infer based on how they We can infer based on how they socialize and interact socialize and interact
• We can infer based on how many We can infer based on how many other times they’ve been co-located other times they’ve been co-located in the pastin the past
• We can infer based the context We can infer based the context (where they are and what they’re (where they are and what they’re doing)doing)
AA BB
A and B were co-A and B were co-locatedlocated
16
Co-locationSuppose A and B are co-located. How might we deduce if they are actually friends?
AA BB
A and B were co-A and B were co-locatedlocated
1.1. We can infer based on how they We can infer based on how they socialize and interact socialize and interact
• We can infer based on how many We can infer based on how many other times they’ve been co-located other times they’ve been co-located in the pastin the past
• We can infer based the context We can infer based the context (where they are and what they’re (where they are and what they’re doing)doing)
17
Co-locationSuppose A and B are co-located. How might we deduce if they are actually friends?
AA BB
They were observed They were observed together on 100 together on 100 occasionsoccasions
On the same busOn the same bus
1.1. We can infer based on how they We can infer based on how they socialize and interact socialize and interact
• We can infer based on how many We can infer based on how many other times they’ve been co-located other times they’ve been co-located in the pastin the past
• We can infer based the context We can infer based the context (where they are and what they’re (where they are and what they’re doing)doing)
A and B were co-A and B were co-locatedlocated
If we just infer based on 2. we might guess that they are friends, when it’s very likely they are not.
18
Co-locationSuppose A and B are co-located. How might we deduce if they are actually friends?
1.1. We can infer based on how they We can infer based on how they socialize and interact socialize and interact
• We can infer based on how many We can infer based on how many other times they’ve been co-located other times they’ve been co-located in the pastin the past
• We can infer based the context We can infer based the context (where they are and what they’re (where they are and what they’re doing)doing)
AA BB
They were observed They were observed together on 4 together on 4 occasionsoccasions
3 times at A’s house, 3 times at A’s house, and 1 time at B’s and 1 time at B’s househouse
A and B were co-A and B were co-locatedlocated
If we just infer based on 2. we might guess that they are not-friends, when in fact it’s much more likely that they are.
19
Co-locationSuppose A and B are co-located. How might we deduce if they are actually friends?
This example motivates two hypotheses: that the number of co-locations of two people is a poor indicator of their relationship between them, and that context about the location can help in prediction.
AA BB
A and B were co-A and B were co-locatedlocated
20
How can we derive context on a large scale, only from
location data?
21
How can we derive context on a large scale, only from
location data?
One Option:Location Diversity
22
Location Diversity
For a given location we define:
Frequency: total number of observations at the location
User Count: total number of users observed at the location
Entropy: the entropy of the distribution of observation of distinct users
Location diversity helps us identify the locations where chance co-locations are most likely. Locations with high diversity have more chance encounters.
23
Location DiversityFrequency:Frequency: LOWLOWUser count:User count: LOWLOWEntropy:Entropy: LOWLOW
(40.46,-79.9)(40.46,-79.9)
(40.45,-79.9)(40.45,-79.9)(40.45,-80.0)(40.45,-80.0)
(40.46,-80.0)(40.46,-80.0)
9/14, 9:00AM9/14, 9:00AM
9/18, 10:00AM9/18, 10:00AM
9/18, 10:05AM9/18, 10:05AM
Observation = Observation = (user id, latitude, (user id, latitude, longitude, time)longitude, time)
ObservationsObservationsAAAA
AAAA
AAAA
AAAA Observation of user AObservation of user ABBBB Observation of user BObservation of user BCCCC Observation of user CObservation of user C
We look at We look at allall observations of users observations of users over time over time at a at a given location.given location.
24
Location DiversityFrequency:Frequency: HIGHHIGHUser count:User count: LOWLOWEntropy:Entropy: LOWLOW
(40.46,-79.9)(40.46,-79.9)
(40.45,-79.9)(40.45,-79.9)(40.45,-80.0)(40.45,-80.0)
(40.46,-80.0)(40.46,-80.0)
AAAA
AAAA
AAAAAAAA
AAAA
AAAA
AAAA
AAAA
AAAAAAAA
AAAA
AAAA
AAAA Observation of user AObservation of user ABBBB Observation of user BObservation of user BCCCC Observation of user CObservation of user C
We look at We look at allall observations of users observations of users over time over time at a at a given location.given location.
25
Location DiversityFrequency:Frequency: HIGHHIGHUser count:User count: HIGHHIGHEntropy:Entropy: LOWLOW
(40.46,-79.9)(40.46,-79.9)
(40.45,-79.9)(40.45,-79.9)(40.45,-80.0)(40.45,-80.0)
(40.46,-80.0)(40.46,-80.0)
AAAA
AAAA
AAAAAAAA
BBBB
AAAA
AAAA
AAAA
AAAAAAAA
AAAA
CCCC
Here, co-locations are more likely to mean friendship.
AAAA Observation of user AObservation of user ABBBB Observation of user BObservation of user BCCCC Observation of user CObservation of user C
We look at We look at allall observations of users observations of users over time over time at a at a given location.given location.
26
Location DiversityFrequency:Frequency: HIGHHIGHUser count:User count: HIGHHIGHEntropy:Entropy: HIGHHIGH
(40.46,-79.9)(40.46,-79.9)
(40.45,-79.9)(40.45,-79.9)(40.45,-80.0)(40.45,-80.0)
(40.46,-80.0)(40.46,-80.0)
Here, co-locations are more likely to be due to chance.
AAAA Observation of user AObservation of user ABBBB Observation of user BObservation of user BCCCC Observation of user CObservation of user C
CCCC
AAAA
AAAABBBB
BBBB
CCCC
AAAA
CCCCBBBB
AAAA
BBBB
CCCC
We look at We look at allall observations of users observations of users over time over time at a at a given location.given location.
27
Connection to Biological Connection to Biological Diversity:Diversity: Ecologists have been Ecologists have been using entropy to study location for using entropy to study location for over 50 years.over 50 years.
UsesUses:: habitat determination, habitat determination, health of an ecosystem, land use health of an ecosystem, land use determinations for conservationdeterminations for conservation
28
How does location How does location diversity relate to diversity relate to
predicting predicting (Facebook) friendships (Facebook) friendships
from co-location?from co-location?
29
AAAA
BBBB
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
BBBB
BBBB
BBBB
BBBB
CCCC
CCCC
CCCC
CCCC
AAAA
BBBB
An edge An edge indicates a co-indicates a co-
locationlocation
Location 1 History
Location 2 History
AA BB
Case 1: Its difficult to conclude that A and B.
Case 2: It’s more likely that A and B are actually friends.
HIGH HIGH EntropEntrop
yy
LOW LOW EntropEntrop
yy
EEEE
EEEE
DDDDDDDD
Recall these Recall these diagrams show all diagrams show all historical historical observations at the observations at the location over time. location over time. An edge indicates An edge indicates the users were the users were there are the same there are the same time.time.
30
AAAA
AAAA
AAAA
AAAA
BBBB
BBBB
BBBB
CCCC
CCCC
CCCC
AAAA
BBBB
AAAA
AAAA
AAAA
AAAA
BBBB
BBBBCCCC
CCCC
AAAA
BBBB
AAAA
AAAAAAAA
BBBB
BBBB
BBBB
BBBB
CCCC
CCCC
CCCC
CCCC
AAAA
BBBB
AAAA
AAAA
AAAA
BBBBCCCC
AAAA
Location 1 History
Location 2 History
Location 3 History
AA BB
An edge An edge indicates a co-indicates a co-
locationlocation
Here it is difficult to conclude that A and B are friends.
DDDD
DDDD
EEEE
DDDD
DDDD
EEEE
EEEE
DDDD
EEEE
EEEE
The history of The history of AA and and BB’s co-’s co-locationlocationThe history of The history of AA and and BB’s co-’s co-locationlocation
31
The history of The history of AA and and BB’s co-’s co-locationlocationThe history of The history of AA and and BB’s co-’s co-locationlocation An edge An edge
indicates a co-indicates a co-locationlocation
Here it is much more likely that there A and B are friends.
AA BB
AAAA
BBBB
AAAA
AAAA
AAAA
AAAA
AAAA
BBBB
AAAA
AAAA
AAAA
AAAA
AAAA
AAAAAAAA
BBBB
BBBB
BBBB
BBBB
AAAA
BBBB
BBBB
AAAA
AAAA
DDDD
DDDD
DDDD
DDDD
DDDDDDDD
DDDD
DDDD
AAAA
BBBB
Location 1 History
Location 2 History
Location 3 History
32
Location EntropyPittsburgh, PA
33
Location EntropyPittsburgh, PA
Shopping and Dining
Universities
Shopping and Dining
Bars and Pubs
Residential
Residential
HIGH EntropyHIGH Entropy
LOW EntropyLOW Entropy
HIGH EntropyHIGH Entropy
HIGH EntropyHIGH Entropy
LOW EntropyLOW Entropy
HIGH EntropyHIGH Entropy
34
The The historyhistory of unique people that visit a of unique people that visit a location location over timeover time tells us a great deal tells us a great deal of information about that location.of information about that location.
This in turn provides insight into the This in turn provides insight into the individuals that visit the location, and individuals that visit the location, and the social interactions that occur there.the social interactions that occur there.
35
The The historyhistory of unique people that visit a of unique people that visit a location location over timeover time tells us a great deal tells us a great deal of information about that location.of information about that location.
This in turn provides insight into the This in turn provides insight into the individuals that visit the location, and individuals that visit the location, and the social interactions that occur there.the social interactions that occur there.
We used this general principal to define We used this general principal to define other potentially useful features of co-other potentially useful features of co-location data.location data.
36
Feature Categories
DescriptionDescription
Intensity and Intensity and DurationDuration
The size and spatial and temporal range of The size and spatial and temporal range of the set of co-locations.the set of co-locations.
Location DiversityLocation Diversity Location diversity measures of the locations Location diversity measures of the locations where the users were co-located.where the users were co-located.
SpecificitySpecificityWhether the locations the users were co-Whether the locations the users were co-located are “shared” with the community or located are “shared” with the community or “specific” to them.“specific” to them.
Structural Structural PropertiesProperties
Relevant structural properties of the co-Relevant structural properties of the co-location graph that are indicative of location graph that are indicative of friendship. friendship.
37
Feature Categories
DescriptionDescription
Intensity and Intensity and DurationDuration
The size and spatial and temporal range of The size and spatial and temporal range of the set of co-locations.the set of co-locations.
Location DiversityLocation Diversity Location diversity measures of the locations Location diversity measures of the locations where the users were co-located.where the users were co-located.
SpecificitySpecificityWhether the locations the users were co-Whether the locations the users were co-located are “shared” with the community or located are “shared” with the community or “specific” to them.“specific” to them.
Structural Structural PropertiesProperties
Relevant structural properties of the co-Relevant structural properties of the co-location graph that are indicative of location graph that are indicative of friendship. friendship.
These features use shallow These features use shallow properties of the co-location properties of the co-location history: history: how many times, how how many times, how many places, what time of day, many places, what time of day, etc.etc.
38
Feature Categories
DescriptionDescription
Intensity and Intensity and DurationDuration
The size and spatial and temporal range of The size and spatial and temporal range of the set of co-locations.the set of co-locations.
Location DiversityLocation Diversity Location diversity measures of the locations Location diversity measures of the locations where the users were co-located.where the users were co-located.
SpecificitySpecificityWhether the locations the users were co-Whether the locations the users were co-located are “shared” with the community or located are “shared” with the community or “specific” to them.“specific” to them.
Structural Structural PropertiesProperties
Relevant structural properties of the co-Relevant structural properties of the co-location graph that are indicative of location graph that are indicative of friendship. friendship.
These features predominately use These features predominately use properties derived from the history of properties derived from the history of location observations, such as the location observations, such as the location entropy.location entropy.
39
The Data
489 users with at least 1 month of tracking data from Locaccino489 users with at least 1 month of tracking data from Locaccino
AreaArea: Restricted to users in the Pittsburgh metro area: Restricted to users in the Pittsburgh metro area
RecruitmentRecruitment: some from formal user studies, some were invited : some from formal user studies, some were invited friends of participants, other randomly joinedfriends of participants, other randomly joined
System use is possibly across non-overlapping time intervalsSystem use is possibly across non-overlapping time intervals
About 90% of the users were laptop usersAbout 90% of the users were laptop users
In all over 4 million location observations In all over 4 million location observations
40
Comparing the networksComparing the networks
Social NetworkSocial Network Co-location NetworkCo-location Network Intersection (co-Intersection (co-located friends)located friends)
Num EdgesNum Edges 10071007 36363636 360360
Our goal it to differentiate meaningful edges in the co-locations from co-locations of chance.
Co-location among users is pervasive, yet co-location among friends is comparatively rare.
We would like to predict whether two users are friends from their co-location history alone.
41
Evaluation
ClassifiersClassifiers: trained 3 AdaBoost classifiers (with decision : trained 3 AdaBoost classifiers (with decision stumps). stumps).
• One only used Intensity and Duration featuresOne only used Intensity and Duration features
• One used Diversity, Structural, and Specificity featuresOne used Diversity, Structural, and Specificity features
• One used all featuresOne used all features
BaselineBaseline: we classify solely based on the number of times the : we classify solely based on the number of times the users were co-located.users were co-located.
GoalGoal: Compare Intensity and Duration features to Diversity, : Compare Intensity and Duration features to Diversity, Structural, and Specificity features.Structural, and Specificity features.
42
Using features such a location entropy significantly improves performance over shallow features such as number of co-locations
43
Using features such a location entropy significantly improves performance over shallow features such as number of co-locations
Full model
Full model
Inte
nsity
feat
ures
Inte
nsity
feat
ures
without Intensity
without Intensity
Num
ber
of
co-l
oca
tions
Num
ber
of
co-l
oca
tions
44
This highlights the variability This highlights the variability in online social network ties in online social network ties with respect to behavior.with respect to behavior.
Overall classifier performance Overall classifier performance was good for testing our was good for testing our hypotheses, but was not great hypotheses, but was not great for classification purposes.for classification purposes.
Accuracy is high, but Accuracy is high, but precision/recall trade-offs are precision/recall trade-offs are poor do to unbalanced class poor do to unbalanced class proportions (many more non-proportions (many more non-friends than friends)friends than friends)
If the end goal is If the end goal is classification, perhaps more classification, perhaps more specialized approaches specialized approaches might be best.might be best.
45
Additional Findings
We also looked at the relationship between an individuals location history, and the number of Facebook friends a user has.
We found a convincing positive relationship between the entropy of places a user goes to and the number friends the user has.
46
Correlation of mobility features with number of friendsThe location diversity variables and the mobility regularity variables show very strong correlations.
Users that have irregular routines, and users who visit diverse locations have more connections in the Locaccino social network.
47
Limitations
Many users, spread over different time periods.
Most of the users were laptop users, which offers a course approximation of mobility.
Population is homogenous.
48
Future Work
Non binary ties:Non binary ties:
Numeric ties -- tie strength Numeric ties -- tie strength from colocationfrom colocation
Categorical ties -- Categorical ties -- relationship typesrelationship types
More data from smart phonesMore data from smart phones
More specialized learning More specialized learning modelsmodels
49
I’d be happy to take your questions!
Thank you for your time and attention.Thank you for your time and attention.
Justin CranshawJustin [email protected]@cs.cmu.edu
Illustration by David Pearson, in William Safire, Illustration by David Pearson, in William Safire, On LanguageOn Language, New York Times Magazine, , New York Times Magazine, June 26, 2009.June 26, 2009.
50
51
Extra Slides:Extra Slides:
52
User MobilityLook at the history of locations of each userWe define a set of features of the location history of each user that is predictive of the number of friends they have in the Locacciono network.
53
User Mobility Features
DescriptionDescription
Intensity and Intensity and DurationDuration
These features describe the size and spatial and These features describe the size and spatial and temporal range of the set observations of the user.temporal range of the set observations of the user.
Location DiversityLocation DiversityThese features describe the diversity of These features describe the diversity of observations collected at the locations the user observations collected at the locations the user visits.visits.
RegularityRegularityThese features describe temporal regularity of the These features describe temporal regularity of the location observations of the user. Do their location observations of the user. Do their observations follow a regular routine or are they observations follow a regular routine or are they random?random?
54
Structural ComparisonsSocial NetworkSocial Network Co-location NetworkCo-location Network Intersection (co-Intersection (co-
located friends)located friends)
Num VerticesNum Vertices 489489 489489 489489Num Non-Isolate Num Non-Isolate
VerticesVertices 366366 245245 127127
Num EdgesNum Edges 10071007 36363636 360360Num Connected Num Connected
ComponentsComponents 4444 9191 9999Largest Components Largest Components
SizeSize 299299 293293 8484
DensityDensity 0.0130.013 0.0630.063 0.0050.005
ConnectednessConnectedness 0.590.59 0.560.56 0.060.06
TransitivityTransitivity 0.410.41 0.480.48 0.420.42
55
Why do we want to do this?Why do we want to do this?
The relationship between online social networks The relationship between online social networks and physical location is understudied.and physical location is understudied.
Partitioning the social graph is a hard and Partitioning the social graph is a hard and important problemimportant problem
Could have implications in creating better Could have implications in creating better (context based) social network privacy controls(context based) social network privacy controls