Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of … · 2013. 9. 22. · Sensus:...

10
Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of Human Movement Anonymized for Review Figure 1: Sensus allows its users to observe and record the behavior of crowds of people across large areas. ABSTRACT Quantifying the pulse and flow of public spaces can inform urban planning and mobile interaction, but it has proven no- toriously difficult to sense human movement over large ar- eas. This paper presents Sensus, a crowd-sensing platform that demonstrates how users can act as effective single-point sensors for detecting nearby crowds by using the wireless ca- pability of the devices they already carry with them. Four data-driven applications demonstrate how this approach can succeed at larger scale and with fewer users than previously possible: 1) inferring patterns of human movement between public spaces; 2) grouping together spaces that attract sim- ilar individuals; 3) measuring the turnover rates at differ- ent locations; and 4) predicting the current occupancy of a venue based only on wifi signals. We evaluate Sensus with a small fourteen-person deployment that captured patterns from 13,152 devices and 4,209 trips between locations in a local community. Finally, we introduce a correction method to reduce common forms of crowd-sensing sampling bias. Under review at CHI ‘14. Author Keywords ubiquitous computing; crowd-sensing ACM Classification Keywords H.5.2. Information Interfaces and Presentation: Graphical User Interfaces INTRODUCTION Despite a growing body of data about the physical world - including vehicle traffic [27], weather conditions, and water usage [12], we know very little about the movement patterns of the people who inhabit it. Applications that have knowl- edge of how people move around their world, and when, can support a broad set of civil engineering and mobile computing needs. For instance, such patterns can identify popular places to eat [11], discover sociological trends across communities [6], produce smarter urban planning [17] and improve trans- portation [14]. In addition, patterns of human movement create an unmatched design tool for new classes of human- centered applications. However, comprehensive, detailed data on human movement has proven difficult to gather. Existing systems have relied on two incomplete strategies. One strategy is to use special- ized hardware to collect information from a limited number of venues, for example tracking individual customers using high 1

Transcript of Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of … · 2013. 9. 22. · Sensus:...

Page 1: Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of … · 2013. 9. 22. · Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of Human Movement Anonymized for Review

Sensus: Crowd-Sensing Wireless Devicesto Detect Patterns of Human Movement

Anonymized for Review

Figure 1: Sensus allows its users to observe and record the behavior of crowds of people across large areas.

ABSTRACTQuantifying the pulse and flow of public spaces can informurban planning and mobile interaction, but it has proven no-toriously difficult to sense human movement over large ar-eas. This paper presents Sensus, a crowd-sensing platformthat demonstrates how users can act as effective single-pointsensors for detecting nearby crowds by using the wireless ca-pability of the devices they already carry with them. Fourdata-driven applications demonstrate how this approach cansucceed at larger scale and with fewer users than previouslypossible: 1) inferring patterns of human movement betweenpublic spaces; 2) grouping together spaces that attract sim-ilar individuals; 3) measuring the turnover rates at differ-ent locations; and 4) predicting the current occupancy of avenue based only on wifi signals. We evaluate Sensus witha small fourteen-person deployment that captured patternsfrom 13,152 devices and 4,209 trips between locations in alocal community. Finally, we introduce a correction methodto reduce common forms of crowd-sensing sampling bias.

Under review at CHI ‘14.

Author Keywordsubiquitous computing; crowd-sensing

ACM Classification KeywordsH.5.2. Information Interfaces and Presentation: GraphicalUser Interfaces

INTRODUCTIONDespite a growing body of data about the physical world -including vehicle traffic [27], weather conditions, and waterusage [12], we know very little about the movement patternsof the people who inhabit it. Applications that have knowl-edge of how people move around their world, and when, cansupport a broad set of civil engineering and mobile computingneeds. For instance, such patterns can identify popular placesto eat [11], discover sociological trends across communities[6], produce smarter urban planning [17] and improve trans-portation [14]. In addition, patterns of human movementcreate an unmatched design tool for new classes of human-centered applications.

However, comprehensive, detailed data on human movementhas proven difficult to gather. Existing systems have reliedon two incomplete strategies. One strategy is to use special-ized hardware to collect information from a limited number ofvenues, for example tracking individual customers using high

1

Page 2: Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of … · 2013. 9. 22. · Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of Human Movement Anonymized for Review

definition cameras [21] or inferring the presence of individu-als by instrumenting the wireless router at a location [9]. Thesecond strategy is to rely on crowdsourced, voluntary check-ins to obtain sparse data for many venues [11]. Both thesestrategies have limitations. The hardware-mediated strategyproduces relatively complete data in one venue, but lacksinformation about larger, multi-venue movement patterns.Crowdsourced check-in strategies span many more venues,but are too sparse for many applications. For instance, thelocal check-in application Foursquare currently shows that ofthe four Starbucks stores in our city, three have no peoplechecked in, while the last shows only a single person. Thisdata is too sparse to accurately represent the actual occupancyat these locations. Likewise, San Francisco International Air-port, which reportedly sees nearly 50,000 passengers per day[1], is only reporting 159 check-ins — a measurement rate ofjust 0.3% at one of the most popular venues on the service.

We introduce a crowd-sensing approach that combines thevolunteer dynamics of check-in systems with the reach andscale of hardware sensing. In particular, it argues that rel-atively few users are needed to act as effective single-pointsensors for their surroundings. In addition, it introduces amechanism that allows any user to participate as a crowd sen-sor using the hardware they already carry with them, such asa smartphone or tablet.

This paper presents Sensus, a crowdsourced system for gath-ering single venue and cross-venue human movement datafrom wireless device traces. We enable this distributed crowdsensing by allowing users to monitor a subset of the low-layer,connectivity-oriented wifi traffic. Wifi-enabled devices revealtheir unique MAC address when they try to associate with ormaintain associations with public wireless networks. Cap-turing these signals with the onboard wireless radio from anearby computer or smartphone allows Sensus to determinewhen people enter and leave a venue, while preserving theanonymity of the people they are trying to understand and al-lowing the Sensus use to browse the internet as usual. Bycombining data across crowd volunteers, Sensus can trackpopulation movements over an entire urban area. As is thecase with many crowdsensing systems, the data gathered bySensus is biased by the spatial and temporal distribution of ac-tive users. We thus introduce a method to mitigate this bias,weighing observed data by its probability of being observed,to produce more accurate estimates.

Sensus operates as a background application for Mac OSX computers, though similar utilities run on today’s smart-phones as well. Whenever a user running Sensus brings theircomputer to a new public venue, such as a park or coffee shop,the user can choose to become a sensor node and start listen-ing to the wireless devices around them. This data is tied tothe nearest public Foursquare venue.

The wide range and dense observations of Sensus’s single-point crowd sensing can enable novel applications that weredifficult or impossible before, due to sparse data. We present:

• Inter-Venue Flows: Sensus can map the urban flow be-tween locations by identifying devices that travel from one

sensed venue to another. By doing so, it can help char-acterize patterns of human movement through a building,campus or city.

• Multi-Venue Clustering: By performing agglomerativeclustering on flow data, Sensus can identify sets of venueswith similar patronage, with orders of magnitude fewerregistered users than previous techniques (e.g., [6]).

• Venue Pulse: Is the venue a public thoroughfare, with peo-ple entering and leaving all the time? Or is it more popularamongst loiterers? Sensus can characterize the pulse of alocation through temporal patterns of movement.

• Venue Occupancy: Sensus can build a predictive model ofactual human occupancy based on observable variations inwireless signals. This model is trained using locally crowd-sourced population observations (e.g., “I can see twentypeople here now”). Sensus can reliably reach a mean ab-solute prediction error of roughly four people after twenty-five crowdsourced observations.

These applications enable users, organizations, and civicgroups to make more informed decisions by using patternsof human movement. For instance, a coffee shop connois-seur might use Sensus to determine shop in their area is leastcrowded and has the quietest work environment. The propri-etor of a small business might decide her business location byanalyzing population flow and clustering. Longitudinal dataon venue occupancy may enable a city library system to iden-tify the best hours of operation in different neighborhoods. Aregional transit authority might use inter-venue flows to opti-mize bus routes and schedules, whereas urban planners mightuse the same data to identify underserved communities.

To follow, we discuss the wireless infrastructures that al-low for wide-scale device detection, the design of the Sen-sus system, applications built onto the Sensus system, and amethod for mitigating bias in crowdsensing systems. Finally,to demonstrate its utility, we explore the results from two trialdeployments of the Sensus system.

RELATED WORKThere have been numerous projects that track aggregate hu-man movement via smartphone traces. However, Sensus pro-vides unique advantages in terms of scale, flexibility, andanalysis. Applications such as FourSquare and FacebookCheckin allow users to log their locations manually [10, 11].While these data points create an accurate signal, they missthe majority of the human movement because relatively fewpeople use these services compared to actual venue occu-pancy. Systems such as Apple Maps, Google Maps, and Wazetake this concept a step further by monitoring without re-quiring active user check-ins. Instead, they track their users’movements through mobile location services to improve thequality of their products. In the case of traffic monitoring, asparse sampling may provide good traffic data since road con-ditions are shared amongst all vehicles on a given road, butthis approach is less applicable to people on foot who spenduncorrelated amounts of time visiting different venues.

2

Page 3: Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of … · 2013. 9. 22. · Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of Human Movement Anonymized for Review

It is somewhat simpler to track people’s movements in moreconstrained settings such as offices [26, 3, 2], homes [19] andretail spaces [9, 21]. Customer tracking systems use WiFisignals and cameras to track potential customers nearby andwithin a store. However, these systems require specializedsensing hardware set up at the target location, limiting theirpotential reach. In contrast, Sensus is limited is only limitedby the number of users who install the application on theirdevices.

Aggregating raw movement data can produce higher-levelsignals. WiFi signals from smartphones enable estimation ofthe flux of individuals across a threshold and approximate thetrajectories of individual users [18, 16, 24]. Retail-orientedsolutions analyze the data they collect in order to help busi-nesses better understand the movements of their customers,measuring metrics such as wait times, frequented locationswithin a store, and percentage of passing customers that en-ter a store. Other analysis [6] has made progress usingsimilarities in check-in patterns on foursquare to determinesub-communities within a city. The flexible and scalable na-ture of Sensus allows us to gather location information aboutmore devices with fewer volunteer sensing nodes. If deployedwidely, this would allow us to make more reliable estima-tions about movement patterns with comparatively few sens-ing nodes.

As Sensus makes use of the WiFi signals produced by every-day devices as well as the existing movement patterns of vol-unteer sensor nodes, the project can be viewed as extendingthe set of infrastructure-mediated sensing techniques. In gen-eral, these techniques make use of existing systems of infras-tructure such as ducts, pipes, or radio signals in order to inferinformation about the people near the systems. Projects suchas Hydrosense [12] and [19] use small disturbances in wa-ter and air flow to sense fixture usage and human movement,respectively. HumanTenna [5] and WiSee [20] make use ofambient electromagnetic waves to infer people’s body posi-tions and gestures. However, these approaches currently re-quire specialized hardware and offer little information aboutpeople’s motions at a larger scale.

Privacy is a central question in WiFi tracking. In Cambridge(MA), a team of researchers deployed a city-wide sniffingnetwork to study local wireless network traffic dynamics.This enabled them to track popular web sites and googlesearches from different parts of the city, detect malicious traf-fic and even track public trains and buses [22]. Google hasalso come under scrutiny for capturing the unencrypted pack-ets transmitted within range of their Street View cars [23].These projects highlight the difference between what sortsof information collection are technically feasible and whatis considered appropriate by the average person. In Sensus,we limit the richness of the information collected to the bareminimum necessary to study group dynamics at and betweenlocations.

Data about people’s motions over time can be a rich sourcefor further processing. Times and locations of co-locatedMAC addresses can predict the connections between people -whether friends, family or colleagues [7]. A medium-length

synchronized deployment of laptops with WiFi sensing soft-ware in three urban environments allowed researchers to ob-tain the durations, frequencies of appearance and aggregatemovement of distinct wireless devices [4]. It is not our inten-tion to reinvent these applications, rather we present sampleapplications in an attempt to validate and explore the poten-tial of our approach. We additionally hope that Sensus mayserve as a platform for wider exploration of such applications.

SENSUSThe Sensus application tracks crowd behavior, observing thewifi signals transmitted by local devices to find, connect toand stay connected with wireless networks. These signalsallow the Sensus system to infer both large- and small-scalepatterns of human movement.

Technical BackgroundAll wireless communications between client devices and ac-cess points are transmitted via frames. The IEEE 802.11specifications for wireless local area networks (WLAN) pre-scribe three distinct frame types for public network commu-nication [13]. Control frames are meant to control the digitalhandshakes between wireless devices. Management framesare intended to manage devices’ network association and syn-chronization; subtypes such as beacons and probe requestsbroadcast and ping for available access points, respectively.Data frames encompass all subtypes that either transport orregulate the transportation of higher-layer application data.The Null frame, which is only transmitted from clients, is anunusual but important member of the data frame family thatsends a single power management bit to an access point to in-dicate the continued availability or temporary unavailabilityof the client.

Any frame sent from a wireless client can be passively ob-served by nearby wireless devices, given the right conditions.Frames are transmitted from most wireless devices on a sin-gle channel, determined by the capabilities of its wireless cardand the configuration of the local network. (Some devicesoperate on more than one channel to optimize for greater per-formance.) To overhear these frames, a nearby device musthave its wireless card tuned to the same channel as the senderand be positioned within range of its transmission. Althoughwireless clients frequently encounter frames that are destinedfor other devices, their default behavior is to disregard any-thing with a destination address that does not match their ownor a broadcast addresse. To hold onto these frames (or rather,to “sniff” the local traffic), a device’s wireless card must beset to “monitor mode,” a transition that typically requires rootaccess. Many operating systems and software systems de-signed to perform network troubleshooting make use of thismode to filter nearby traffic for a desired subset of frames.

Since different frames reveal different kinds of informationabout the wireless devices that transmit and receive them, theinformational needs of an application must be known in orderto choose an appropriate frame capture filter. An applicationwhose purpose is to snoop on local internet traffic would fil-ter for all data frames containing HTTP. Conversely, for ap-plications, such as Sensus, that are intended to observe de-

3

Page 4: Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of … · 2013. 9. 22. · Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of Human Movement Anonymized for Review

vices without revealing anything about the activity occurringover different ports, frames containing higher layer contentcan and should be ignored. Any system that is meant to iden-tify devices across two or more WLANs should capture de-vice MAC addresses, as private IP addresses (the other com-mon means of identifying a device on a WLAN) are regularlychanged and reassigned. Finally, to approximate the distancebetween and/or the size of physical barriers between a crowdsensor and an observed device, it is helpful to obtain the sig-nal strengths of the frames transmitted from this device.

System DesignAt its core, Sensus (Figure 2) is a system for sensing de-vice presence by listening to continuously sent, automaticallygenerated public wifi traffic while preserving the anonymityof individual devices and their respective online activity. Tomeet these requirements, the system needs to observe signalsthat are sent regularly by wireless clients, contain the client’sMAC address, keep any information about a user’s onlineactivity private, and which might reasonably be considered“public.”

Probe request frames fulfill some of these needs, and have al-ready been employed by several systems to track device pres-ence [8]. Wireless devices transmit these frames on everyavailable channel any time they try to actively find the set ofaccessible nearby routers. (These routers respond with self-identifying probe responses.) However, observing probe re-quests on their own would preclude the observation of manynearby wifi-active devices; once these devices establish a con-nection with a network, they either send out probe requestswith decreased regularity or stop sending them altogether.

Null data frames allow us to observe wireless devices whenprobe requests cannot, while still preserving privacy. Devicesthat are connected to a network regularly send these framesto associated routers as a keep-alive protocol. Null framescontain a single power management bit to inform a routerwhether to continue sending data to the respective device,or to buffer this data while the device goes into power savemode. Although these frames are only transmitted on a singlechannel, this is guaranteed to be one of the channels providedby the WLAN; thus, a nearby device associated with the samenetwork will still have a high likelihood of observing theseframes. While wireless device owners are likely aware thatsomething like a probe request exists — their device must bedoing something in order to produce a list of nearby wirelessaccess points — they may be less unaware of Null frames.However, these frames maintain the same degree of privacysince they are sent regularly, independent of the type and fre-quency of higher-level network activities.

The Sensus application uses the tcpdump sniffing utility toobserve local Null data frames and probe request frames. Atthe end of every full minute, Sensus looks at these framesto determine the MAC address, average signal strength, andaccess point affiliation of each visible device. As soon as thisinformation has been collected, MAC addresses are hashed topreserve device anonymity. These device traces gives us theset of local devices that are “awake,” but they tell us nothing

about the many other local devices that are either off, asleepor in power-save mode.

To more accurately estimate the number of local wirelessdevices (whether activated or not) over a given minute, oursystem translates individual requests into device “sessions”,which act as the fundamental unit of analysis for all of ourapplications. These sessions represent the intervals of timeover which we believe a device is present at a given venue,based on its perceived arrivals and departures. A device’s firstsession begins when it is first observed at a venue. From thatpoint forward, any time it is gone for longer than thirty min-utes, it is assumed to have departed, and its current sessionis terminated. This thirty minute timeout was deliberatelychosen because up until about thirty minutes, incremental in-creases in the timeout length would dramatically decrease thetotal number of calculated sessions; since the most commonsmartphone brands send either probe requests or null packetsimmediately after returning from power-save mode, this ap-pears to imply that thirty minutes is an unusually long amountof time for a smartphone owner to go without checking theirphone.

Correcting for Crowdsourced SamplingUnlike continuously-running hardware solutions, Sensus canonly capture device traces when a user is sensing from thatlocation. However, this crowdsourced data collection intro-duces an issue with data bias. Because Sensus users collectdata as they go about their daily routine, their routine has aneffect on the data collected. As users are unlikely to collectdata at certain places and times, the data set can become bi-ased against common samples that are infrequently observeddue to the movements of the contributors.

The key to correcting this bias lies in our ability to use col-lected data to approximate unobserved data. We assume that,within reason, the population data we are approximating isdrawn from the same distribution as the observed data. Then,by normalizing collected data by how often users were avail-able to observe it, it is possible to populate unobserved peri-ods with approximate observations.

To illustrate this issue, consider a contributor who goes to avenue and observes one hour of data. During that time, theycollect 100 five-minute sessions and one forty-five minutesession. According to the raw data, it appears that the ra-tio of forty-five minute sessions to five minute sessions is1:100 (1%). However, this sample is biased against longersessions. Because Sensus can only register sessions that bothstart and end within our observation period, it can only record45 minute sessions that begin during the first 15 minutes ofobservation. However, the system could observe a full 55minutes of five minute sessions. In order to account for thefull hour, we normalize the count of the types of samplesby the proportion of time in which it was possible to collectthem. Sessions lasting 45 minutes are weighted by a factorof 60/15, and five minute sessions get weighted by a factorof 60/55. This yields a ratio of about 4:109 (3.5%) for 45minute sessions to five minute sessions, a fairly significantmultiplicative factor from the base observations and a betterestimation of the true ratio.

4

Page 5: Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of … · 2013. 9. 22. · Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of Human Movement Anonymized for Review

Figure 2: Sensus users capture anonymous information about probe and null packets that wifi-enabled devices send out in public locations. The Sensusserver then converts these frames into coherent sessions, enabling a broad variety of applications.

It is important to note that this method only produces an ap-proximation of the real data. The strength of this approxi-mation is proportional to the amount of real data we collect.If data from short observation sessions is used to approxi-mate data for long stretches of time, the final result will likelybe strongly affected by any peculiarities observed during theshort sessions, and will thus be an inaccurate representationof reality. In the current dataset, roughly 1,500 total minutesof real data is used to approximate 1,000 total additional min-utes of observation.

Sensus User ExperienceSensus runs as a native Mac OS X application in the Finderbar. The app is designed to maximize user commitment byminimizing interruption, so other than a small icon in the cor-ner of the screen, there is usually no other indication that it isrunning.

If the user connects to a router that the app has not yet seen,however, they will be presented with a pop-up window. Here,they choose between labeling their current venue using theFoursquare location listing, or deciding that the router is at aprivate location and thus should not be collected from. Oncethe user is connected to a router that they cleared for collec-tion, the app begins the tcpdump process and begins pushingdata to the remote server once per minute.

If the user decides to label the venue of the router, theirlocation will be determined using Mac Location Servicesand they will be presented with a list of nearby Foursquarevenues. Using Foursquare venues provides three benefits.First, Foursquare venues are standard, so that if two usersvisit the same venue they will give a unique but identicallabel. Second, Foursquare provides additional informationabout its venues, such as their geographic location and a cat-egory (for example, coffee shop or college lab). Lastly, sinceFoursquare’s database only contains venues that other peoplehave checked into, using these venues helps to enforce Sen-sus’s policy of only sensing public venues.

In the course of using the app, if the user decides that theywould like to change their labeling of the router or their deci-sion to collect there, they may do so by clicking on the app’sicon at the top of the screen.

While our application exclusively runs on Mac OSX, it is pos-sible to create a similar Sensus application for any other de-

Figure 3: Inter-Venue Flows: A visualization of directional flows of peo-ple between venues during the deployment.

vice that supports tcpdump, including mobile phones or Rasp-berry Pis.

APPLICATIONSIn an effort to evaluate the strengths and limitations of theSensus system, we created applications including calculatingthe flow of traffic, clustering venues by clientele, visualizingthe pulse of a venue, and predicting the occupancy of a venue.

FlowFine-grained information about the way people move fromplace to place is an invaluable asset to city planners hoping tooptimize transportation systems, business owners looking foran attractive location for a new store, or residents looking tooptimize their commute. As MAC addresses of devices areseen at different locations in succession, we can infer usersare traveling between these locations. Aggregating these ob-servations over the entire observed population produces anestimate of the amount of traffic between different locations.

Figure 1 shows the lunchtime foot traffic observed by four-teen Sensus users, from fourteen different venues on a uni-

5

Page 6: Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of … · 2013. 9. 22. · Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of Human Movement Anonymized for Review

versity campus. With after three hours of total sensing, thesevisualizations reveal interesting patterns. For example, in theearly afternoon, traffic is more heavily concentrated in thesouthern part of campus, where there are several restaurants.Additionally, a large portion of traffic passes by the ClawFountain. More precisely, the greatest rates of flow wereobserved between the Claw Fountain and Tressider Unionas well as between the Claw Fountain and the Green Li-brary Coupa Cafe, and the movement volume swells as thelunch hour progresses. These observations align with knownpatterns of cross-campus movement. Likewise, the pairs ofvenues between which there was no observable flow weresmall and relatively far apart. These conditions decrease thelikelihood of any observable human movement.

Venue ClusteringOne reason people might enjoy the venues that they frequentis because of the community of people there. Thus the set ofother places visited by the people at a venue likely includeplaces that a patron of that venue might want to visit in thefuture.

Sensus observes identifiable devices, so if it records a deviceat two different venues, then it knows that the owner of thedevice went to both places. By representing venues in termsof the people it sees there, it can get an idea of the closenessof the communities of different venues and cluster them ac-cordingly.

Figure 4 shows our clustering of the venues we have observedsince the launch of Sensus and provides us with an idea ofthe relationships between these venues. In this instance, thebottom cluster contains venues from the central locations oncampus, whereas the top cluster contains venues frequentedby computer science graduate students. In general, distanceappears to be a key factor in determining flow between lo-cations. However, the Thai Cafe and the Quad stand out astwo exceptions to this general trend. Both are situated rightnext to each other, in the region between the east and westcampus. However, they belong to different clusters: the ThaiCafe is loosely bound to west campus venues, whereas theQuad is clustered with venues to the South and East. Theinsight: west campus students visit the Thai Cafe for lunchbecause it is one of their closest options, but students who arealready in the Quad go elsewhere to eat.

Figure 4: Venue Clustering: The results of performing hierarchical clus-tering on the venues on campus

Figure 5: Pulse: Visualizing session lengths at three locations on campusduring the distributed deployments

PulseIt is possible to learn about the character of a venue by thefrequency at which people come and go. For example, isit somewhere where people come and work for a long time,such as an office building, or do people pass through quicklyas they would a fast food joint? It would also be interestingto know how many people are coming and going at any giventime. Such information could help determine properties suchas atmosphere (bustling vs. sedentary) or speed of service,especially since those qualities may vary by time of day inlocations such as coffee shops and libraries.

Figure 5 shows the distribution of the device session lengthsobserved over lengthy deployments from three on-campusvenues: a cafe, a coffee shop, and a library reading room. Allthree of these diagrams demonstrate that the majority of ob-served devices at each venue appear for a relatively short spanof time (thirty minutes or less), asymptotically approachingzero — with some notable peaks along the way — as the ses-sion length increases.

OccupancyThe crowdedness of a venue can be a good indicator of thenoise level, wait time, and table space that a visitor is likelyto encounter. While different venues are likely to be morecrowded at certain times of day, days of the week and timesof the year, visitors are not necessarily familiar with thesetrends. Additionally, a number of other factors includingscheduled events, adverse weather, and the local academiccalendar can impact the size of observed populations sep-arate from temporal variation. It would be difficult to fig-ure out which explanatory variables are behind these trends.Even if they were identified, many of these explanatory vari-ables would be very difficult to record and incorporate into amodel. Rather than trying to capture the independent vari-ables that cause fluctuations in venue populations, Sensus

6

Page 7: Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of … · 2013. 9. 22. · Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of Human Movement Anonymized for Review

predicts venue occupancy using the volume of communica-tions from local wireless devices, as it is directly influencedby venue occupancy.

Sensus can estimate venue occupancy reliably using roughlytwenty-five human-labeled population counts per venue tocalibrate its wireless device observations. Afterwards, if aSensus user is sitting in a venue, any other user can get animmediate estimate of the venue’s current human occupancywithout disturbing that Sensus user. The system’s estimateshave mean average prediction errors of 3-5 people.1

DEPLOYMENT AND RESULTSSensus seeks to demonstrate that human movements, on thescale of a single venue or across many venues, can be ef-fectively crowdsourced by human sensors, using hardwarethat users already have on hand. Each Sensus applicationis built to test for sensitivity to different measures of humanmovement. To test the prospects of inter-venue flows andvenue clustering, we organized a distributed deployment ofthe Sensus system, assigning individuals to run this applica-tion from fourteen predetermined locations across the campusof a large, American university. To gauge the feasibility of de-tecting the venue pulse (i.e. trends in people’s visit lengths),and venue occupancy, we conducted much longer longitudi-nal deployments at three venues on the same campus.

Distributed DeploymentWith what regularity do wireless devices resurface at differentlocations, and how well are these transitions captured by dis-tributed crowd sensors? Our first deployment tested Sensus’sability to reliably observe the transitions of wireless devicesacross different locations, as this affects the levels of detailand accuracy that can be achieved by venue clustering andinter-venue flows.

To evaluate these multi-venue applications, fourteen studentswere recruited to operate Sensus simultaneously at four-teen predetermined locations across a large university cam-pus in the western United States. This deployment spannedtwo weekday lunch hours in late July, from approximatelynoon until 1:30PM. Individual students began sensing be-tween 11:30am and 12:15pm, and stopped sensing between12:50pm and 2:00pm. We intentionally chose densely occu-pied locations to maximize the number of observable devices.The same set of locations was observed over each lunch hour;with the exception of two venues, one caused by a venue clo-sure, and another due to a participant sensing at the wrongvenue. Crowdsourced data is often irregular and incomplete,and irregularities in the places and times of observation pro-vide an opportunity to assess system robustness.

Results: Sensus robustly detects flow patternsOver the course of the deployment, Sensus observed 8,315distinct devices, 4,209 transitions from one venue to an-other, and 1,867 transitions that occurred within fifteen min-utes. Capturing all of these traces from only fourteen sensors1When evaluating this error, it should be noted that the estimates aremade on a per-minute basis. Many people can move in and out of avenue within the span of a minute, whereas human labelings providea single integer estimate per minute.

speaks to the power and reach of wireless sensing as wellas the crowd sensing paradigm. The number of devices ob-served from each venue (per unit time) aligned well with theobservers’ experiences at the venues. An average of 14.4 de-vices were observed per minute from the Claw Fountain, acrowded venue that lies at the crossroads between many corecampus facilities. Conversely, the Green Library Entrance,which is relatively unoccupied over the summer months, sawan average of 0.6795 observed devices per minute. Figures 1and 3 showcase the device flow patterns that were observedon campus over the two lunch-hour observation periods, af-ter correcting for crowdsourcing bias. The most popular per-hour flows are presented in Table 1.

A variant of the general technique for correcting crowd-sensing data is applied to the observed flows in order to re-duce observation bias. To accurately assess relative flows,each flow is scaled according to the net amount of time forwhich flows could have been observed over the given path.Although the rates of flow might vary across time of day, wemake the assumption that the flows were relatively consistentover the short span of the lunch hour deployments.

LimitationsWhile the deployment provides evidence that inter-venueflows can successfully identify human movements betweenvenues, flows provide little information about the paths takenbetween their respective start- and end-points. Further, Sen-sus only detects device movements from venues where theapplication is in current use; detours through venues that arenot under observation are not detected. Future work can aimto algorithmically correct for these unobserved midpoints inthree-point paths.

Agglomerative clusteringWe use agglomerative clustering to group the flow data toidentify both distinct groups of venues and pairs of closelyrelated venues. In short, two venues are considered similar ifwe sense similar sets of devices at each.

Clustering begins by assigning an n-dimensional vector toeach observed venue, where n is the number of unique de-vices we have observed, each dimension corresponding to adevice. The value of each dimension is the number of timesthe device was observed at the venue. Then, each vector isnormalized, and the euclidian distance is calculated betweenall vector pairs. A binary tree is grown from the bottom upby grouping together the two closest set of vectors as a lowbranch. The distance between two sets of vectors is definedby the distance between their two most distant vectors. Thegrouping process is complete when there is only one vectorset left.

The results of venue clustering are displayed in Figure 4. Theassigned clusters largely correspond to the physical proximityof different venues; the first branching separates the east cam-pus from the west campus, while successive branches identifyincreasingly localized geographic differentiations.

Longitudinal Deployment

7

Page 8: Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of … · 2013. 9. 22. · Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of Human Movement Anonymized for Review

Location A Location B A to B Flow(Devices/hr)

B to A Flow(Devices/hr)

The Quad The Claw 54.3 23.9The Claw Libr. Coupa 77.3 39The Claw Tresidder 66 60

Table 1: Summary of flows between three pairs of locations

To assess the Sensus system’s ability to generate useful datafor venue pulse and venue occupancy applications, Sensuswas longitudinally deployed to three distinct venues: an on-campus cafe, an on-campus library reading room, and an off-campus coffee shop. These deployments involved at least 24hours of sensing per venue, over a span of at least three days.In order to learn a model that could estimate venue occupan-cies, we added a feature to the Sensus client which, once ev-ery ten to fifteen minutes, launched a pop-up to ask users toreport how many people were visible at the current venue.In total, 98, 112 and 264 human-labeled population countswere gathered from the cafe, reading room and coffee shop,respectively.

These three longitudinally-sensed venues were intentionallychosen to be diverse. The cafe had a significant amount ofunaffiliated wireless traffic from nearby venues and an out-door patio. The library reading room was extremely large andreceived significant amounts of through-traffic; it also housedpublic computers on wireless, adding realistic “non-human”wifi signals. The coffee shop was the smallest venue.

Predicting pulse and occupancySensus predicts pulse and occupancy via ordinary leastsquares linear regression. The best features were obtainedby counting the number of temporally and spatially collo-cated devices that fell into one of nine bins, according to: 1)their average signal strength in {X >= −70dB,−70dB >X >= −87dB, or − 87dB < X} and 2) their connectivitystatus in {same router as sensor, different router, unaffiliated}Each of these individual features (bins) showed varying lev-els of significance, depending on the venue whose popula-tion was being modeled. Such differences are to be expected,given the diverse layouts and patterns of device usage thatwere found at each venue. Over time, insignificant featuresare eliminated to form reduced models for each venue, in aneffort to avoid overfitting. This reduction can be performedalgorithmically by methods such as Lasso regression [25].

Results: Sensus predicts occupancy within 3–5 peopleTo evaluate the efficacy and learnability of our general model,we tested our three feature-reduced linear models with leave-one-out cross validation [15]. To obtain robust estimates,we repeated this technique twenty times for each sample sizefrom each venue, drawing new samples on each iteration.

Figure 6 show the learning curves for each venue, based onthe average prediction errors resulting from increasingly largesamples. These curves demonstrate that reasonably usefulpredictive models can be obtained after twenty-five observa-tions. This count is just greater than the number of obser-vations required to effectively eliminate insignificant termsfrom the model.

The curves for each venue differ, in that they asymptoticallyapproach different mean average prediction errors. At thecafe, where the local population has a mean average devi-ation of 10.8484 and an interquartile range from 20 to 35,the curve shows that the model approaches a mean absoluteprediction error of around 3.7 people. Similarly, the libraryreading room and coffee shop have mean average deviancesof 4.813918 and 11.8676, and interquartile ranges from 15 to22 and 19 to 36.25, but the models approach mean absoluteprediction errors of 2.7 and 4.8, respectively.

A combination of factors is may lie behind these differencesin mean absolute prediction error: Venues that host popula-tions with low rates of wireless device use or high rates ofexternal noise have a weaker signal from which to estimatethe local population. At venues where the population changesrapidly, the number of visitors can fluctuate over the courseof a minute; in this case, instantaneous counts of the localpopulation undercount the number of individuals that wereactually present within a given minute. Finally, venues wherethe population size is large are especially prone to incorrecthuman population labelings.

In summary, Sensus is able to predict venue populations withmean absolute prediction errors around four people, afteras few as twenty-five human-labeled population counts pervenue.

Results: Pulse captures short-lived sessionsThe pulse data presented in Figure 5 includes both session-ization and corrections for crowdsourcing bias. Since Sensusrelies on a timeout period to determine the beginning and end-time of a session, the application must be running for at leastthe duration of a single timeout; any sessions still running atthe end of the observation session is cut-off. The sessions thatare not cut-off are reweighted to account for observation bias,according to the crowd-sensing correction strategy presentedin an earlier section.

Figure 5 indicates that at each of the three venues, the bulk ofthe observed devices were only visible for short amounts of

0

1

2

3

4

5

6

7

8

0 25 50 75 100 125Observations

Mea

n A

bsol

ute

Pre

dict

ion

Err

or

Coffee Shop Coupa Cafe Reading Room

Figure 6: Mean absolute prediction error of occupancy count versusnumber of human observations.

8

Page 9: Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of … · 2013. 9. 22. · Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of Human Movement Anonymized for Review

time (on the order of 30 minutes or less). This trend matchedhuman observations of local visitor dynamics. At any instant,between one-third and two-thirds of the patrons were long-session visitors, the remaining population saw rapid, consis-tent turnover - outnumbering the dwellers, over the long-term.

LimitationsWith respect to population prediction, Sensus demonstratedrelatively robust prediction accuracy across three very differ-ent venues. However, a larger deployment would be neces-sary in order to make stronger generalizability claims.

With pulse visualizations, because sessions are detectedfrom observed wireless signals, session lengths tend to beunderestimated—gaps are likely to exist between a person’sarrival and the first wireless transmission from their device,or their departure and the last transmission from the device.Second, our correction techniques reduce the bias of short pe-riods of observation, but they cannot account for sessions thatare longer than the longest period of observations.Both theselimitations become less severe as Sensus users observe largernumbers of sessions.

DISCUSSIONThough a crowdsourced paradigm for monitoring human dy-namics brings many exciting applications into reach, Sensusworks best in small- and mid-sized venues, confined to a sin-gle room or open area, with public wifi access. In largermulti-room spaces, such as office buildings and stadiums, oursystem is neither able to observe all the colocated devices norrecognize the fraction of the devices that it is able to observe.To address this challenge, SpaceSense would need to assignsub-venues — defined as rooms or spaces within the greatervenue — from which the whole population is easily countableby one individual.

As with most crowd-sourced systems for gathering real worldinformation, the data gathered by Sensus is incomplete. Thesystem can only learn about venues that have been sensedby users, and only for the times covered by those users. Al-though Sensus weights the data to adjust for sampling bias, itcannot make any claims about device sessions that last longerthan the longest stretch of continuous observation, limitingthe range of venue pulse data. Luckily, as more users runSensus from a venue, its coverage increases and better repre-sents the ground truth.

To obtain and analyze community-wide human dynamicsdata, we plan to launch a wide-scale public deployment ofthe Sensus system. With an enormous influx of user datathroughout the hours of the day and days of the week, itwill be useful to modify the pulse, flow and occupancy ap-plications to reflect changes in trends based on the time ofday, day of the week and month of the year. To promote theuse of Sensus and improve its user experience, we intend togive users access to all available occupancy and pulse predic-tions — both longitudinal and real-time — through the nativeapp; if we are able to attract a large enough user base, weeven hope to provide personalized venue recommendations.To obtain data from venues that are less conducive to lap-top use, we plan to deploy model B Raspberry Pis equipped

with wireless adapters. These single-board computers makecheap and versatile sensing nodes and could demonstrate thebenefits resulting from a hybrid crowdsourced, point-sourcedsystem. This hardware option is preferable to other wifi-enabled devices, as they would cost $45 (USD) apiece in-cluding adapters.

We find it prudent to discuss the potential applications of theSensus system that we chose to avoid. Broadly, we do notwant this system to be big brother. By their nature, all un-encrypted wireless access points grant nearby sniffers accessto a large amount of browsing data. This traffic can containpersonalized content such as product recommendations or thecontents of an online shopping cart and even user-identifiableinformation such as a person’s name, address or phone num-ber. Although there are likely applications of investigatinghow browsing activity varies by venue, community and re-gion, this data also has the potential to attach the identity ofan individual to their web browsing behavior as well as theirdaily whereabouts. Since we feel that any potential benefitsof such a system do not outweigh the cost of compromisingindividual privacy, we limit our sensing to the most privacy-conserving device traces available.

We take additional steps to ensure that Sensus does not com-promise privacy, including applying a hashing function tocollected MAC addresses. Additionally, we never reveal theidentity of a given device in the data that we expose to theend user. This means that no third parties can ever see thehashed MAC address corresponding to a session length orvenue visit, nor see the set of visited venues corresponding toan individual device. Also to preserve privacy, we decided tolimit Sensus to the sensing of public venues. The inclusion ofprivate venues would make it easier to establish the identitiesof individual device owners, should any of our data becomecompromised. However, any time that a system makes socialpresence information more visible than it used to be, it can re-sult in privacy concerns. We must be mindful of the possibletradeoffs while navigating these boundaries.

CONCLUSIONThe Sensus system introduces a crowd-powered paradigm forsensing human movement. Unlike current systems that re-quire user check-ins, Sensus can learn large-scale human dy-namics at a given venue with only a single local user. Thisapproach can learn community-wide human flows and venueclustering from a far smaller number of participants thanwould be required by check-ins. Although individual venuescan sense clients’ movements with either computer vision-backed video surveillance or router-mediated device tracking,Sensus provides similarly detailed data without the need forspecifically instrumented sensing infrastructures, and scalesout to far broader physical areas.

The utility of Sensus is directly tied to its area of coverage, soa major next step for Sensus is to attract large-scale adoption.To gatherthis adoption, it will be necessary to consider issuesof incentives both for volunteers and hosts of hardware mon-itors. Are Sensus’s applications sufficient to attract volunteerusage? Or would a quid-pro-quo be necessary so users onlygain access to the data if they actively act as sensors?

9

Page 10: Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of … · 2013. 9. 22. · Sensus: Crowd-Sensing Wireless Devices to Detect Patterns of Human Movement Anonymized for Review

Expanded development of crowd sensing technologies couldallow everyone from ordinary citizens to urban planners tohave access to a rich and continuously updated picture ofthe world we live in. This picture could include informationabout local climates, flows of foot and vehicle traffic, or evenpictures and sound. We believe that Sensus represents a stepin this direction as it shows that it is possible for people tocollaboratively build up dense knowledge networks about theworld they live in. Sensus demonstrates that this informationcan be captured by ordinary people using commodity hard-ware, empowering the very people it benefits.

REFERENCES1. Airports council international passenger summary. URL

http://www.aci.aero/Data-Centre/Monthly-Traffic-Data/Passenger-Summary/Year-to-date.

2. Balazinska, M. and Castro, P. Characterizing mobilityand network usage in a corporate wireless local-areanetwork. In Proc of Mobile systems 2003, pp. 303–316.ACM, 2003.

3. Cheng, Y.C., et al. Jigsaw: solving the puzzle ofenterprise 802.11 analysis, vol. 36. ACM, 2006.

4. Claveirole, T., Boc, M., and de Amorim, M.D. Anempirical analysis of wi-fi activity in three urbanscenarios. In Pervasive Computing andCommunications, 2009, pp. 1–6. IEEE, 2009.

5. Cohn, G., Morris, D., Patel, S., and Tan, D.Humantenna: using the body as an antenna for real-timewhole-body interaction. In Proc of 2012 ACMconference on Human Factors in Computing Systems,pp. 1901–1910. ACM, 2012.

6. Cranshaw, J., Schwartz, R., Hong, J.I., and Sadeh, N.M.The livehoods project: Utilizing social media tounderstand the dynamics of a city. In ICWSM. 2012.

7. Cunche, M., Kaafar, M.A., and Boreli, R. I know whoyou will meet this evening! linking wireless devicesusing wi-fi probe requests. In (IEEE WoWMoM 2012,pp. 1–9. IEEE, 2012.

8. Desmond, L.C.C., Yuan, C.C., Pheng, T.C., and Lee,R.S. Identifying unique devices through wirelessfingerprinting. In Proc ACM conference on Wirelessnetwork security 2008, pp. 46–55. ACM, 2008.

9. Euclid analyitics. http://euclidanalytics.com.

10. Facebook. https://www.facebook.com.

11. Foursquare. https://foursquare.com.

12. Froehlich, J.E., et al. Hydrosense:infrastructure-mediated single-point sensing ofwhole-home water activity. In Proc of Ubicomp ’11, pp.235–244. ACM, 2009.

13. IEEE. Wireless LAN Medium Access Control (MAC) andPhysical Layer (PHY) Specifications, 2012. URLhttp://standards.ieee.org/getieee802/download/802.11-2012.pdf.

14. Kang, C., Sobolevsky, S., Liu, Y., and Ratti, C.Exploring human movements in singapore: acomparative analysis based on mobile phone and taxicabusages. In Proc 2nd ACM SIGKDD InternationalWorkshop on Urban Computing, p. 1. ACM, 2013.

15. Kearns, M. and Ron, D. Algorithmic stability andsanity-check bounds for leave-one-out cross-validation.Neural Computation, 11(6):1427–1453, 1999.

16. Kim, M., Kotz, D., and Kim, S. Extracting a mobilitymodel from real user traces. In INFOCOM, vol. 6, pp.1–13. 2006.

17. Kriplean, T., et al. Supporting agile modeling throughexperimentation in an integrated urban simulationframework. In Proc of International Digital GovernmentResearch Conference on Public Administration Online2010, pp. 112–121. Digital Government Society ofNorth America, 2010.

18. Musa, A. and Eriksson, J. Tracking unmodifiedsmartphones using wi-fi monitors. In Proc of the 10thACM Conference on Embedded Network SensorSystems, pp. 281–294. ACM, 2012.

19. Patel, S.N., Reynolds, M.S., and Abowd, G.D. Detectinghuman movement by differential air pressure sensing inhvac system ductwork: An exploration in infrastructuremediated sensing. In Pervasive Computing, pp. 1–18.Springer, 2008.

20. Pu, Q., Jiang, S., and Gollakota, S. Whole-home gesturerecognition using wireless signals. In Proc of ACMSIGCOMM 2013, pp. 485–486. ACM, 2013.

21. Retail next. http://www.retailnext.net.

22. Rose, I. and Welsh, M. Mapping the urban wirelesslandscape with argos. In Proc of 8th ACM Conferenceon Embedded Networked Sensor Systems, pp. 323–336.ACM, 2010.

23. Shiels, M. Google admits wi-fi data collection blunder.BBC News, 2010. URL http://news.bbc.co.uk/2/hi/technology/8684110.stm.

24. Song, L., Kotz, D., Jain, R., and He, X. Evaluatinglocation predictors with extensive wi-fi mobility data. InINFOCOM 2004, vol. 2, pp. 1414–1424. IEEE, 2004.

25. Tibshirani, R. Regression shrinkage and selection via thelasso. Journal of the Royal Statistical Society, pp.267–288, 1996.

26. Want, R., Hopper, A., Falcao, V., and Gibbons, J. Theactive badge location system. ACM TOIS, 10(1):91–102,1992.

27. Waze. http://www.waze.com.

10