Data Management in the CarTel Mobile Sensor Computing System

4
Data Management in the CarTel Mobile Sensor Computing System Vladimir Bychkovsky, Kevin Chen, Michel Goraczko, Hongyi Hu, Bret Hull, Allen Miu, Eugene Shih, Yang Zhang, Hari Balakrishnan, Samuel Madden MIT Computer Science and Artificial Intelligence Laboratory The Stata Center, 32 Vassar St., Cambridge, MA 02139 1. INTRODUCTION Worldwide, there are over 600 million automobiles on the road. Each automobile is a potentially rich source of sen- sor data, with the current generation of cars having over 100 sensors. Unlike many other mobile platforms, an automobile is a resource-rich environment that can support relatively robust computation and communication systems. More im- portantly, because automobiles interface with a vast amount of the physical world and are well-integrated into our daily lives, they are uniquely positioned to enable a broad range of sensing applications. What can we do with 600 million mobile computing units (cars), each with tens of sensors, and on which we can place large amounts of computation? Here are some classes of applications that would arise if we expanded the reach of today’s Internet-based computing substrate to include au- tomobiles. In all these applications, cars are information sources. 1. Traffic monitoring and route planning. Suppose we tracked the location of every car (suitably anonymized) once per second using GPS. We can use this informa- tion to develop statistical models of traffic delays at various times of day on different road segments. Sup- pose you want to leave your home for the airport to catch a flight at 8:00 am. Which of the four different routes to the airport should you take? The statistics of delays between 6:00 am and 7:30 am can be used to develop a smarter route planning system than what is available today. 2. Preventive maintenance and diagnostics of cars. By tapping into the on-board sensors using the standard CAN (controller area network) interface, and by at- taching a variety of external sensors, we can mon- itor and report internal performance characteristics such as emissions, gas mileage, tire pressure, suspen- sion health, etc. These reports can be combined with historical data, highlighting long-term changes in a Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00. car’s internals, and correlating a given car’s informa- tion with other cars of the same vintage to detect anomalies in a car’s performance. 3. Civil infrastructure monitoring. When equipped with additional sensors to sample vibration and other con- ditions, cars can act as excellent “probes” to sense road conditions. Assessing and reporting road surface con- ditions such as potholes, oil spills, flooding, and ice can help cities and towns identify roads that need repair at relatively low cost, and help drivers to learn about hazardous driving conditions. To enable these types of applications and others, we pro- pose a reusable data management system, called CarTel, for querying and collecting data from intermittently con- nected devices. CarTel provides a simple, incrementally- deployable platform for developing automobile-based sensor applications. Our platform provides a dynamic query sys- tem that allows for both continuous (standing) and one-shot geo-spatial queries over car position, speed, and sensory data as well as both a low-cost and high-bandwidth substrate for communicating with a large network of mobile devices. What follows is a more detailed treatment of the key design challenges in building CarTel and an overview of the novel components of our system. 2. SYSTEM OVERVIEW Figure 1 illustrates the basic architecture of the CarTel system. Each car is equipped with a CarTel node, which is an embedded PC outfitted with a variety of sensors and soft- ware for data collection. As cars drive around, they collect data, such as the car’s current GPS location, WiFi availabil- ity, and engine performance anomalies. Sensor data is up- loaded and control messages are downloaded using intermit- tent wireless connections (e.g., 802.11 hotspots, Bluetooth cellphones, or other CarTel-enabled cars). This information eventually reaches the Internet, where it is delivered to our data management interface, the AutoPortal, for visualiza- tion, analysis, and browsing. CarTel relies on three key underlying technologies: 1. AutoPortal: Server software that provides data man- agement, visualization, and web-based querying. This software requests data from remote nodes, aggregates reports arriving from those nodes into a coherent pic- ture of current conditions, and visualizes that data. 1

Transcript of Data Management in the CarTel Mobile Sensor Computing System

Data Management in the CarTel Mobile Sensor ComputingSystem

Vladimir Bychkovsky, Kevin Chen, Michel Goraczko, Hongyi Hu, Bret Hull,Allen Miu, Eugene Shih, Yang Zhang, Hari Balakrishnan, Samuel Madden

MIT Computer Science and Artificial Intelligence LaboratoryThe Stata Center, 32 Vassar St., Cambridge, MA 02139

1. INTRODUCTIONWorldwide, there are over 600 million automobiles on the

road. Each automobile is a potentially rich source of sen-sor data, with the current generation of cars having over 100sensors. Unlike many other mobile platforms, an automobileis a resource-rich environment that can support relativelyrobust computation and communication systems. More im-portantly, because automobiles interface with a vast amountof the physical world and are well-integrated into our dailylives, they are uniquely positioned to enable a broad rangeof sensing applications.

What can we do with 600 million mobile computing units(cars), each with tens of sensors, and on which we can placelarge amounts of computation? Here are some classes ofapplications that would arise if we expanded the reach oftoday’s Internet-based computing substrate to include au-tomobiles. In all these applications, cars are informationsources.

1. Traffic monitoring and route planning. Suppose wetracked the location of every car (suitably anonymized)once per second using GPS. We can use this informa-tion to develop statistical models of traffic delays atvarious times of day on different road segments. Sup-pose you want to leave your home for the airport tocatch a flight at 8:00 am. Which of the four differentroutes to the airport should you take? The statisticsof delays between 6:00 am and 7:30 am can be used todevelop a smarter route planning system than what isavailable today.

2. Preventive maintenance and diagnostics of cars. Bytapping into the on-board sensors using the standardCAN (controller area network) interface, and by at-taching a variety of external sensors, we can mon-itor and report internal performance characteristicssuch as emissions, gas mileage, tire pressure, suspen-sion health, etc. These reports can be combined withhistorical data, highlighting long-term changes in a

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00.

car’s internals, and correlating a given car’s informa-tion with other cars of the same vintage to detectanomalies in a car’s performance.

3. Civil infrastructure monitoring. When equipped withadditional sensors to sample vibration and other con-ditions, cars can act as excellent “probes” to sense roadconditions. Assessing and reporting road surface con-ditions such as potholes, oil spills, flooding, and ice canhelp cities and towns identify roads that need repairat relatively low cost, and help drivers to learn abouthazardous driving conditions.

To enable these types of applications and others, we pro-pose a reusable data management system, called CarTel,for querying and collecting data from intermittently con-nected devices. CarTel provides a simple, incrementally-deployable platform for developing automobile-based sensorapplications. Our platform provides a dynamic query sys-tem that allows for both continuous (standing) and one-shotgeo-spatial queries over car position, speed, and sensory dataas well as both a low-cost and high-bandwidth substratefor communicating with a large network of mobile devices.What follows is a more detailed treatment of the key designchallenges in building CarTel and an overview of the novelcomponents of our system.

2. SYSTEM OVERVIEWFigure 1 illustrates the basic architecture of the CarTel

system. Each car is equipped with a CarTel node, which isan embedded PC outfitted with a variety of sensors and soft-ware for data collection. As cars drive around, they collectdata, such as the car’s current GPS location, WiFi availabil-ity, and engine performance anomalies. Sensor data is up-loaded and control messages are downloaded using intermit-tent wireless connections (e.g., 802.11 hotspots, Bluetoothcellphones, or other CarTel-enabled cars). This informationeventually reaches the Internet, where it is delivered to ourdata management interface, the AutoPortal, for visualiza-tion, analysis, and browsing.

CarTel relies on three key underlying technologies:

1. AutoPortal: Server software that provides data man-agement, visualization, and web-based querying. Thissoftware requests data from remote nodes, aggregatesreports arriving from those nodes into a coherent pic-ture of current conditions, and visualizes that data.

1

Open WirelessAccess Point

Other CarTelNetworks

AutoPortal

DataAnalysisBackend

Opportunistic IP Relaying via 802.11, WiMAX, etc.

Local CarTelCollecting traffic, CAN info

Peer to Peer InformationExchange

Queries

GUIs/Viz

Clients

User's WirelessAccess Point

Internet

In: Road speeds,Sensor dataOut: Alerts, info

Figure 1: CarTel Architecture.

It also provides tools that allow users to query Car-Tel nodes, requesting that they prioritize delivery ofcertain types of data over others. This subsystem isdescribed in more detail in Section 3.1.

2. CafNet: A networking infrastructure for carry-and-forward networks that works in the face of variableand intermittent network connectivity. CafNet is de-signed to work with a heterogenous set of network tech-nologies and manages the routing of data across manyunreliable, high-latency links. CafNet treats the mo-bility of its network medium (e.g., USB keys, PDAs,cell phones) as an asset that helps it extend the reachof traditional networks. This subsystem is describedin more detail in Section 3.2.

3. Local Data Management on CarTel nodes: A device-level data management infrastructure that collects, pre-processes, and prioritizes information on remote nodesrunning CarTel software. Each device is able to au-tomatically adjust its data-collection schema depend-ing on the sensors present in the car. Data is aggre-gated and queries are processed using a simple stream-processing engine. This subsystem is described in moredetail in Section 3.3.

These three technologies operate in concert to create anasynchronous control loop between web-based users and theembedded PCs traveling in the trunks of peoples’ cars. Thereare a number of challenges in building a system of this sizeand type.

First, the CarTel network is composed of a large numberof embedded systems that must operate in the field withlittle or no human intervention. As sensors are added orremoved from individual cars, our data-management sys-tem must dynamically reconfigure, updating schemas andinstalling adapters that direct data acquisition.

Second, relaying this collected data to a server for analy-sis is difficult because we lack ubiquitous, low-cost Internetaccess. CafNet takes advantage of the fact that our land-scape is increasingly becoming dotted with islands of net-

work connectivity, including WiFi access points, Bluetoothequipped hand-held devices, and other short-range commu-nication technologies.

Finally, as network size grows and the fidelity of datacollection increases, managing and visualizing this data be-comes a non-trivial challenge. Furthermore, since our sys-tem is collecting large quantity of personal information, wemust deal with issues of data security and privacy.

3. DATA MANAGEMENT ARCHITECTUREIn this section we describe the vision for our first CarTel

prototype. The system is currently deployed in a few carsand we will be scaling up to tens of cars over the next fewmonths.

3.1 AutoPortalThe AutoPortal is the primary interface by which users

interact with the CarTel network. It provides several classesof functionality, including data sharing, query processing,data visualization, and network management.

Figure 4 shows a prototype of an AutoPortal applicationthat logs a user’s travels. For each trip, the portal showsvarious statistics about the journey, as well as a graphicalillustration of the trip, with the speed at each point alongthe route illustrated by different colors. Trips are groupedby start and/or end destination, or can be visually queriedby highlighting regions of the map. Users have found eventhis simple application to be useful because it allows themto easily see the points of congestion on their routes betweenhome and work. The AutoPortal provides an easy way forusers to compare routes and determine which ones minimizetheir expected travel time.

The AutoPortal will include a query interface that allowsusers to visually filter and process sensor data without exten-sive knowledge of SQL. Using our system, users can graphi-cally define interest regions on a map showing the collectionarea. Users then define connections between interest regionsusing boolean operators. Additionally, domain-specific op-erators can be dragged to a region to apply filtering andprocessing to any matching sensor data.

For example, imagine that you want to compare all of yourroutes between home and work. You would first highlighttwo regions, one around your home and one around yourwork, and set the return type for each to be a GPS trace.You would then drag a boolean ’and’ operator onto the mapto link the two highlighted regions, ensuring that only thosetraces that go from your home to work are returned. Finally,to make the result set more manageable, you can apply anaggregate operator over the result set that groups tracesbased on similarity. Traces that follow the same route wouldbe grouped and only statistics for unique routes would bereported.

The AutoPortal also provides a feedback mechanism thatallows users to affect data collection in their car or the entireCarTel network. For example, if a query detects that a par-ticular car’s speed and RPM have a low correlation, a usermight request a fine-granularity time-series of sensor datafrom various sub-components of that car’s drivetrain. Theserequests would be made through a high level interface, e.g.,by clicking on an image of a component in a high-resolutionimage of a car to request more information about that com-ponent. These high-level requests would be translated intoqueries that are executed by the query processing infrastruc-

2

Figure 2: CafNet communication stack and applica-tions.

ture (Section 3.3) and are propagated to the relevant carsusing CafNet (Section 3.2).

3.2 CafNetThe design of our network infrastructure, CafNet, is mo-

tivated by two technology trends and the needs of data-intensive CarTel applications. The technology trends are:(1) pockets of inexpensive high-bandwidth wireless accessusing WiFi, and (2) vast amounts of inexpensive storage onsmall devices. Sensing applications can generate hundredsof megabytes of data every day. Cars do not usually havecontinuous access to high-bandwidth connectivity. However,during the course of a day, they will have access to islandsof high wireless bandwidth. Hence, our approach to net-working integrates routing and storage by relying on themovement of cars and users to move data from source todestination. The fundamental data-carrying unit in CarTelis a data mule. Any storage device can act as data mule, in-cluding USB keys and cell phones. As cars traverse islandsof connectivity (e.g., WiFi), they opportunistically attemptto relay data via the Internet. Mules will also opportunisti-cally forward data to each other when they are in the samewireless neighborhood. Figure 2 shows a layered view of thestack.

Future forwarding opportunities are hard to estimate reli-ably in a disconnected mobile environment, such as CarTel.Consequently, applications must gracefully adapt to vary-ing network bandwidth and delay. Our CafNet implemen-tation allows applications to express the importance of datathrough priorities. When bandwidth is insufficient, priori-ties are used to drop data according to its importance.

We designed CafNet to decouple the communication chal-lenges in an intermittently connected environment from thoseof querying and collecting data. Our communication stackconsists of three layers: transport, network, and mule adap-tation layer. The purpose of the transport layer is to pro-

Figure 3: Prototype CarTel hardware placed in cars.Clockwise: the WiFi card in the box, the OBD inter-face to connect to the car’s CAN, the GPS receiver,and the car gateway box.

vide applications with an interface for sending and receivingdata. The network layer maintains topology information andchooses the appropriate media for the next hop. The muleadaptation layer presents heterogeneous types of media tothe network layer in a uniform fashion.

3.3 Device Level Data ManagementFigure 3 shows a prototype of the CarTel data manage-

ment and collection hardware deployed in each car. Insideeach CarTel node is an embedded PC running Linux at-tached to a variety of sensors. This hardware runs a simpledata management system that makes it easy to build appli-cations that process and collect different types of informa-tion from the vehicle. This system provides the followingfeatures:

1. A way to add support for collecting data from newtypes of sensor hardware; these pieces of data are madeavailable as attributes that can be referred to in aquery. For example, if an engineer wants to collectdata about engine RPMs in the car, he can write asmall amount of code that retrieves RPMs at a partic-ular rate from the on-board vehicle bus and passes thatdata onto the query system as the rpm attribute1. Theadapters and attribute metadata for these sensors (ordata sources) can be propagated to the devices, or canbe provided by the sensors themselves. By advertis-ing their attributes and source properties, the sensorsallow the device to recognize and collect data immedi-ately upon attachment, without manual configurationor any other user intervention.

2. Continuous queries that read in low-level attributesand process them. For example, if an engineer wantsto compare engine RPMs to speed, he can run a query

1We have built an interface for CarTel that allows it to col-lect data from the industry standard On Board Diagnostics(OBD) connection included in all cars since 1997.

3

that samples both attributes and compares their rel-ative changes every few seconds. These queries aresimilar to regular SQL queries, but with an additionalparameter to specify the rate at which the query isperformed over the appropriate tables and output toCafNet. The query manager on the device is responsi-ble for locally adding and removing these queries, andexecuting them at the correct intervals. The queryresults are stored in the database.

3. An interface to CafNet that delivers query results overintermittently available network resources. The datasource updates, query updates, and query results areall transmitted via CafNet. Query results can be prior-itized to enable support for coarse-grained data in theface of insufficient bandwidth, with higher-resolutiondata for outlier data points.

As a sample, the following query returns the sliding 5-second-window average of the speed of a specific car, at therate of once every 2 seconds. It assumes the “speed” at-tribute is stored in a destination table called “car.”

SELECT avg(speed)

FROM cartel AS car

WHERE time > now() 5 AND carid = 5

RATE 0.5 HZ

The following query returns a list of model and averagefuel consumption over the past 5 seconds of any Ford-madevehicle of which there are enough samples available (at least5), at the rate of 3 times per second.

SELECT car.model, avg(fuel_consumption)

FROM cartel, cars AS car

WHERE time > now() 5 AND car.make = Ford

AND cartel.carid = car.id

GROUP BY car.model

HAVING count(DISTINCT carid) > 5

RATE 3 HZ

4. DEMONSTRATION HIGHLIGHTSAt the time of the demo, our system will have been run-

ning for several months. To demonstrate the components ofCarTel, you can expect to see the following:

1. Visual query system: We will demonstrate our graph-ical query system using historical data collected byour CarTel deployment in Boston. This demo willshow how our system can be used to compare expectedtravel times for a user’s common routes. For example,one of our users commonly travels from Winchester toBoston. Each day he must select one of four routesto take into work. Using our collected data, we willshow how such a user can sift through the thousands oftraces he has collected to meaningfully compare thesefour routes. We will also demonstrate how our systemaggregates the data of all users to provide a more in-formed route-planning engine. For example, if a userhas a flight that leaves from Logan at 10am, our sys-tem can be used to determine when this user shouldleave and what route she should take to arrive at theairport an hour in advance with high probability.

Figure 4: Screenshot of the AutoPortal prototype.Here, the user is visualizing a recently driven route.

2. Opportunistic data transfers: We will rent three carsand install a CarTel node in each one. During thedemo, these cars will be traveling around downtownChicago, collecting sensor data and uploading the datato our servers whenever an access point is available.At the demo site, we will use the AutoPortal to showthe access points being used for uploads and tracesof where the cars have been traveling in the last fewminutes. In addition, we will use the AutoPortal tochange the continuous query running in the CarTelnodes to report readings from different sensors.

3. Controlled, real-time sensing: We will connect a digi-tal camera to several stationary CarTel nodes placedaround the demo site. The images that these camerascollect will be relayed through a Bluetooth equippedcellphone to a laptop. The cellphone will display net-working statistics such as current throughput and amountof queued data. The laptop will display video framesreceived from the laptop. Visitors will be able to in-teract with a demo by modifying the parameters ofthe query sent to the device. In particular, it will bepossible to zoom into a particular part of the imagein real-time to get a higher resolution version of theportion of the image.

We are making a prototype of the AutoPortal (as shown inFigure 4) available to the SIGMOD demo committee. Pleasedo not share this site outside of the demo PC. For more de-tails please visit: http://carteldb.csail.mit.edu/sigmod06/

4