2ndLecture_RealityMining
-
Upload
shruti-bansal -
Category
Documents
-
view
213 -
download
0
Transcript of 2ndLecture_RealityMining
-
7/27/2019 2ndLecture_RealityMining
1/28
Reality Mining: sensing complex social systemsNathan Eagle, Alex PentlandPervasive and Ubiquitous Computing, 2006
-
7/27/2019 2ndLecture_RealityMining
2/28
Aim
How data collected from mobile phones can beused to uncover regular rules and structures in
the behavior of both individual andorganization
-
7/27/2019 2ndLecture_RealityMining
3/28
Mobile Phones as Wearable
Sensors
Surveys are done by social scientists to learnabout human behavior
Usual survey techniques suffer from: bias
sparsity of data
lack of continuity between discrete questionnaire
absence of dense, continuous data
Use of phones to collect data on humanbehavior
-
7/27/2019 2ndLecture_RealityMining
4/28
Bluetooth
Bluetooth is short-range RF network 10-30 meters in practice
Device-discovery is a standard amongBluetooth devices
Bluetooth MAC address (BTID), Device name, devicetype
BTID is unique
Bluetooth scan is energy-consuming
-
7/27/2019 2ndLecture_RealityMining
5/28
Dataset & Privacy
Prior consent and human subject approval
Dataset 100 Nokia 6600 users
75 Lab users
20 incoming masters students
5 incoming freshman
~450k hours of information about users location,communication, and usage behavior
http://reality.media.mit.edu
-
7/27/2019 2ndLecture_RealityMining
6/28
User modeling
Easily identifiable routines in every personslife
Simple model of behavior Home, work, elsewhere
Data collected from
Bluetooth, cell tower, temporal information fromphone
Incorporate information from static BT devices BT on a desktop
-
7/27/2019 2ndLecture_RealityMining
7/28
User modeling
Accurate location from cell tower Complicated as a phone can receive signals from
far-away towers Accuracy gets better if user spends enough time
Distribution of time spent with a set of towers addsaccuracy
-
7/27/2019 2ndLecture_RealityMining
8/28
Cell tower probability density
functions The probability of being
associated with one of the25 visible cell towers isplotted above for five userswho work on the third floorcorner of the same officebuilding.
Each tower is listed on thex-axis and the probabilityof the phone logging itwhile the user is in hisoffice is shown on the y-axis. (Range was assured to10 m by the presence of astatic Bluetooth device.)
It can be seen that eachuser sees a differentdistribution of cell towersdepending on the locationof his office, with theexception of Users 4 and 5,who are officemates andhave the same distributiondespite being in the officeat different times
Office
mates
-
7/27/2019 2ndLecture_RealityMining
9/28
Observations
Different sets of towers for users within 10 mof radius
6% of time, users were without signal
21% to 29%, users were in range of Bluetoothdevices or other mobile phones
Could Bluetooth be used for localization inside
building during such times? GPS does not work indoors
-
7/27/2019 2ndLecture_RealityMining
10/28
Encountered devices for a subject
during the month of January The subject is only regularly
proximate to otherBluetooth devices between9:00 and 17:00, while atworkbut never at any other
times.
This predictable behaviorwill be defined as lowentropy.
The subjects desktopcomputer is logged mostfrequently throughout theday, with the exception ofthe hour between 14:00 and15:00.
During this time window,
Subject 9 is most oftenproximate to Subject 4
-
7/27/2019 2ndLecture_RealityMining
11/28
Models for location & activity
Human life is imbued with routine access Minute-to-minute routineyearly patterns
There is inherent randomness present amongthe routines
Use of information entropy metric to quantifythe predictable amount
-
7/27/2019 2ndLecture_RealityMining
12/28
A low-entropy subjects dailydistribution of home/work transitions.
The most likely locationof the subject: Work,Home, Elsewhere, and NoSignal. While thesubjects state
sporadically jumps to NoSignal, the other statesoccur with very regularfrequency.
This is confirmed by theBluetooth encountersplotted below representingthe structured workingschedule of the low-entropy subject
-
7/27/2019 2ndLecture_RealityMining
13/28
A low-entropy subjects dailydistribution of encountered Bluetooth
devices.
-
7/27/2019 2ndLecture_RealityMining
14/28
Entropy across demographics
Entropy, H(x), wascalculated from the {work,home, no signal, elsewhere}set of behaviors for 100samples of a 7-day period.
The Media Lab freshmenhave the least predictableschedules, which makessense because they come tothe lab much less regularbasis.
The staff and faculty havethe most least entropicschedules, typically adheringto a consistent work routine
-
7/27/2019 2ndLecture_RealityMining
15/28
User modeling
Role of time is very clear in predicting userbehavior
Uses HMM and EM to model and trains with 1month of data
95% accuracy achieved
-
7/27/2019 2ndLecture_RealityMining
16/28
Mobile Usage Pattern
35% of subjects use the clock applicationregularly
Yet it takes 10 keystrokes to open the application More used at home
Not much use of sophisticated features
Snake used as much as elaborate media
player
-
7/27/2019 2ndLecture_RealityMining
17/28
Average applicationusage in three locations (other,
work, and home) for 100 subjects.
The x-axis displays thefraction of time eachapplication is used, as afunction of total
application usage.
For example, the usage athome of the clockapplication comprisesalmost 3% of the totaltimes the phone is used.
The phone applicationitself comprises more than80% of the total usage andwas not included in thisfigure
-
7/27/2019 2ndLecture_RealityMining
18/28
Data characterization and
validation
Data stored on a flash memory card Flash memory cards have finite number of read-
write cycles Frequent updates led to corruption of memory
cards 10 cards were lost
Later increments were done in RAM and finallogs were written to the card
-
7/27/2019 2ndLecture_RealityMining
19/28
Bluetooth errors
Several technical issues in verifying theaccuracy of collected data
10m range with ability to penetrate walls Periodical scans miss short proximity event
A device may not be discovered (1% to 3%)
Application crash (once every three days)
Redundancy could be leveraged
Most of the time, above problems wereidentified as noise
Logs help in finding anamolies
-
7/27/2019 2ndLecture_RealityMining
20/28
Human-induced errors
Two main errors Phone being off
Battery exhausted Explicit turn-off
1/5 of users do it regularly classrooms, night, movies.
Log is time-stamped before the turn-off
Separated from user
Phone is on but not carried by the user More severe problem
-
7/27/2019 2ndLecture_RealityMining
21/28
Human-induced errors
Forgetting phone 30% claim of never forgetting it
40% claim once every month 30% claim once every week
A Forgotten phone classifier
Identifying a forgotten phones is challenging
Subject could be sick Casually moved beyond 10m of phone
Not enough unique features
-
7/27/2019 2ndLecture_RealityMining
22/28
Missing data
Major causes Data corruption
Powered-off devices
Logs accounting for 85.3% of the time
-
7/27/2019 2ndLecture_RealityMining
23/28
Surveys
Subjects were also surveyed about their socialnetwork
For senior students High correlation
Logged BTID and dyadic self-report/proximity data
For incoming students
Not significant correlation
-
7/27/2019 2ndLecture_RealityMining
24/28
Community structure
Human landmarks Who the user will meet can be guessed
Relationship inference Nature of association can be inferred
Used GMM for clustering
-
7/27/2019 2ndLecture_RealityMining
25/28
Proximity Frequency
-
7/27/2019 2ndLecture_RealityMining
26/28
Proximity networks
Different than the organizational structure Structured around the faculty director
Hub-and-spoke with changing roles
Proximity n/w data is extremely dynamic andsparse.
Deadlines bring more reliance on support of
the group Exploring dynamics of a group in response to both
external and internal stimuli
-
7/27/2019 2ndLecture_RealityMining
27/28
Proximity networks
Peoples free time and schedules shiftdramatically to met deadlines and project
goals Spending much of the night in lab just before the
event
How the aggregate work cycles expand inreaction to global deadlines
Visit of sponsore
-
7/27/2019 2ndLecture_RealityMining
28/28
Conclusions
First paper to log data at such a magnitudeand depth
Provides ethnographic studies, individual usermodeling, group user modeling