2ndLecture_RealityMining

download 2ndLecture_RealityMining

of 28

Transcript of 2ndLecture_RealityMining

  • 7/27/2019 2ndLecture_RealityMining

    1/28

    Reality Mining: sensing complex social systemsNathan Eagle, Alex PentlandPervasive and Ubiquitous Computing, 2006

  • 7/27/2019 2ndLecture_RealityMining

    2/28

    Aim

    How data collected from mobile phones can beused to uncover regular rules and structures in

    the behavior of both individual andorganization

  • 7/27/2019 2ndLecture_RealityMining

    3/28

    Mobile Phones as Wearable

    Sensors

    Surveys are done by social scientists to learnabout human behavior

    Usual survey techniques suffer from: bias

    sparsity of data

    lack of continuity between discrete questionnaire

    absence of dense, continuous data

    Use of phones to collect data on humanbehavior

  • 7/27/2019 2ndLecture_RealityMining

    4/28

    Bluetooth

    Bluetooth is short-range RF network 10-30 meters in practice

    Device-discovery is a standard amongBluetooth devices

    Bluetooth MAC address (BTID), Device name, devicetype

    BTID is unique

    Bluetooth scan is energy-consuming

  • 7/27/2019 2ndLecture_RealityMining

    5/28

    Dataset & Privacy

    Prior consent and human subject approval

    Dataset 100 Nokia 6600 users

    75 Lab users

    20 incoming masters students

    5 incoming freshman

    ~450k hours of information about users location,communication, and usage behavior

    http://reality.media.mit.edu

  • 7/27/2019 2ndLecture_RealityMining

    6/28

    User modeling

    Easily identifiable routines in every personslife

    Simple model of behavior Home, work, elsewhere

    Data collected from

    Bluetooth, cell tower, temporal information fromphone

    Incorporate information from static BT devices BT on a desktop

  • 7/27/2019 2ndLecture_RealityMining

    7/28

    User modeling

    Accurate location from cell tower Complicated as a phone can receive signals from

    far-away towers Accuracy gets better if user spends enough time

    Distribution of time spent with a set of towers addsaccuracy

  • 7/27/2019 2ndLecture_RealityMining

    8/28

    Cell tower probability density

    functions The probability of being

    associated with one of the25 visible cell towers isplotted above for five userswho work on the third floorcorner of the same officebuilding.

    Each tower is listed on thex-axis and the probabilityof the phone logging itwhile the user is in hisoffice is shown on the y-axis. (Range was assured to10 m by the presence of astatic Bluetooth device.)

    It can be seen that eachuser sees a differentdistribution of cell towersdepending on the locationof his office, with theexception of Users 4 and 5,who are officemates andhave the same distributiondespite being in the officeat different times

    Office

    mates

  • 7/27/2019 2ndLecture_RealityMining

    9/28

    Observations

    Different sets of towers for users within 10 mof radius

    6% of time, users were without signal

    21% to 29%, users were in range of Bluetoothdevices or other mobile phones

    Could Bluetooth be used for localization inside

    building during such times? GPS does not work indoors

  • 7/27/2019 2ndLecture_RealityMining

    10/28

    Encountered devices for a subject

    during the month of January The subject is only regularly

    proximate to otherBluetooth devices between9:00 and 17:00, while atworkbut never at any other

    times.

    This predictable behaviorwill be defined as lowentropy.

    The subjects desktopcomputer is logged mostfrequently throughout theday, with the exception ofthe hour between 14:00 and15:00.

    During this time window,

    Subject 9 is most oftenproximate to Subject 4

  • 7/27/2019 2ndLecture_RealityMining

    11/28

    Models for location & activity

    Human life is imbued with routine access Minute-to-minute routineyearly patterns

    There is inherent randomness present amongthe routines

    Use of information entropy metric to quantifythe predictable amount

  • 7/27/2019 2ndLecture_RealityMining

    12/28

    A low-entropy subjects dailydistribution of home/work transitions.

    The most likely locationof the subject: Work,Home, Elsewhere, and NoSignal. While thesubjects state

    sporadically jumps to NoSignal, the other statesoccur with very regularfrequency.

    This is confirmed by theBluetooth encountersplotted below representingthe structured workingschedule of the low-entropy subject

  • 7/27/2019 2ndLecture_RealityMining

    13/28

    A low-entropy subjects dailydistribution of encountered Bluetooth

    devices.

  • 7/27/2019 2ndLecture_RealityMining

    14/28

    Entropy across demographics

    Entropy, H(x), wascalculated from the {work,home, no signal, elsewhere}set of behaviors for 100samples of a 7-day period.

    The Media Lab freshmenhave the least predictableschedules, which makessense because they come tothe lab much less regularbasis.

    The staff and faculty havethe most least entropicschedules, typically adheringto a consistent work routine

  • 7/27/2019 2ndLecture_RealityMining

    15/28

    User modeling

    Role of time is very clear in predicting userbehavior

    Uses HMM and EM to model and trains with 1month of data

    95% accuracy achieved

  • 7/27/2019 2ndLecture_RealityMining

    16/28

    Mobile Usage Pattern

    35% of subjects use the clock applicationregularly

    Yet it takes 10 keystrokes to open the application More used at home

    Not much use of sophisticated features

    Snake used as much as elaborate media

    player

  • 7/27/2019 2ndLecture_RealityMining

    17/28

    Average applicationusage in three locations (other,

    work, and home) for 100 subjects.

    The x-axis displays thefraction of time eachapplication is used, as afunction of total

    application usage.

    For example, the usage athome of the clockapplication comprisesalmost 3% of the totaltimes the phone is used.

    The phone applicationitself comprises more than80% of the total usage andwas not included in thisfigure

  • 7/27/2019 2ndLecture_RealityMining

    18/28

    Data characterization and

    validation

    Data stored on a flash memory card Flash memory cards have finite number of read-

    write cycles Frequent updates led to corruption of memory

    cards 10 cards were lost

    Later increments were done in RAM and finallogs were written to the card

  • 7/27/2019 2ndLecture_RealityMining

    19/28

    Bluetooth errors

    Several technical issues in verifying theaccuracy of collected data

    10m range with ability to penetrate walls Periodical scans miss short proximity event

    A device may not be discovered (1% to 3%)

    Application crash (once every three days)

    Redundancy could be leveraged

    Most of the time, above problems wereidentified as noise

    Logs help in finding anamolies

  • 7/27/2019 2ndLecture_RealityMining

    20/28

    Human-induced errors

    Two main errors Phone being off

    Battery exhausted Explicit turn-off

    1/5 of users do it regularly classrooms, night, movies.

    Log is time-stamped before the turn-off

    Separated from user

    Phone is on but not carried by the user More severe problem

  • 7/27/2019 2ndLecture_RealityMining

    21/28

    Human-induced errors

    Forgetting phone 30% claim of never forgetting it

    40% claim once every month 30% claim once every week

    A Forgotten phone classifier

    Identifying a forgotten phones is challenging

    Subject could be sick Casually moved beyond 10m of phone

    Not enough unique features

  • 7/27/2019 2ndLecture_RealityMining

    22/28

    Missing data

    Major causes Data corruption

    Powered-off devices

    Logs accounting for 85.3% of the time

  • 7/27/2019 2ndLecture_RealityMining

    23/28

    Surveys

    Subjects were also surveyed about their socialnetwork

    For senior students High correlation

    Logged BTID and dyadic self-report/proximity data

    For incoming students

    Not significant correlation

  • 7/27/2019 2ndLecture_RealityMining

    24/28

    Community structure

    Human landmarks Who the user will meet can be guessed

    Relationship inference Nature of association can be inferred

    Used GMM for clustering

  • 7/27/2019 2ndLecture_RealityMining

    25/28

    Proximity Frequency

  • 7/27/2019 2ndLecture_RealityMining

    26/28

    Proximity networks

    Different than the organizational structure Structured around the faculty director

    Hub-and-spoke with changing roles

    Proximity n/w data is extremely dynamic andsparse.

    Deadlines bring more reliance on support of

    the group Exploring dynamics of a group in response to both

    external and internal stimuli

  • 7/27/2019 2ndLecture_RealityMining

    27/28

    Proximity networks

    Peoples free time and schedules shiftdramatically to met deadlines and project

    goals Spending much of the night in lab just before the

    event

    How the aggregate work cycles expand inreaction to global deadlines

    Visit of sponsore

  • 7/27/2019 2ndLecture_RealityMining

    28/28

    Conclusions

    First paper to log data at such a magnitudeand depth

    Provides ethnographic studies, individual usermodeling, group user modeling