Database Middleware for Sensor Networks

118
1 Database Middleware for Sensor Networks Sam Madden Assistant Professor, MIT [email protected] Slides prepared with Wei Hong

description

Database Middleware for Sensor Networks. Sam Madden Assistant Professor, MIT [email protected]. Slides prepared with Wei Hong. Berkeley Mote. Motivation. Sensor networks (aka sensor webs, emnets) are here Several widely deployed HW/SW platforms - PowerPoint PPT Presentation

Transcript of Database Middleware for Sensor Networks

Page 1: Database Middleware for Sensor Networks

1

Database Middleware for Sensor Networks

Sam MaddenAssistant Professor, [email protected]

Slides prepared with Wei Hong

Page 2: Database Middleware for Sensor Networks

2

Motivation• Sensor networks (aka sensor webs, emnets) are here

– Several widely deployed HW/SW platforms• Low power radio, small processor, RAM/Flash

– Variety of (novel) applications: scientific, industrial, commercial– Great platform for mobile + ubicomp experimentation

• Real, hard research problems to be solved– Networking, systems, languages, databases– Central problem: ease of access, appropriate programming

abstractions

I will summarize:– Low-level sensornet issues– A particular middleware architecture:

• TinyDB + TASK– Current and future research middleware ideas

Berkeley Mote

Page 3: Database Middleware for Sensor Networks

Some

Sensornet

Apps

redwood forestmicroclimate monitoring

smart coolingin data centers

http://www.hpl.hp.com/research/dca/smart_cooling/

condition-basedmaintenance

And More…

• Homeland security• Container monitoring

• Mobile environmental apps• Bird tracking • Zebranet

• Home automation• Etc!

structural integrity

Page 4: Database Middleware for Sensor Networks

4

Architectural Overview

Stable Store(DBMS)

Field Tools

Local Servers

Internet

Client Tools GUIs,etcExternal Tools

Sensor Network

TinyDB

Middleware

Middleware Issues:APIs for current + historical access?

Which data when?How to act on data?

Network and node status?

Directed DiffusionCOUGAR

Page 5: Database Middleware for Sensor Networks

5

Declarative Queries

• Programming Apps is Hard– Limited power budget– Lossy, low bandwidth communication– Require long-lived, zero admin deployments– Distributed Algorithms– Limited tools, debugging interfaces

• Queries abstract away much of the complexity– Burden on the database developers– Users get:

• Safe, optimizable programs• Freedom to think about apps instead of details

Page 6: Database Middleware for Sensor Networks

6

TinyDB: Declarative Query Interface to Sensornets

• Platform: Berkeley Motes + TinyOS• Continuous variant of SQL : TinySQL

• Power and data-acquisition based in-network optimization framework

• Extensible interface for aggregates, new types of sensors

Page 7: Database Middleware for Sensor Networks

7

Agenda

• Part 1 : Sensor Networks (40 mins)– TinyOS– NesC

• Part 2: TinyDB + TASK (50 mins)– Data Model and Query Language– Software Architecture

• 30 minute break• Part 3: Alternative Middleware (1:30 mins)

Architectures + Research Directions• Finish around 12

Page 8: Database Middleware for Sensor Networks

8

Part 1

• Sensornet Background• Motes + Mote Hardware

– TinyOS– Programming Model + NesC

• TinyOS Architecture– Major Software Subsystems– Networking Services

Page 9: Database Middleware for Sensor Networks

9

Sensor Networks: a hot topic

• New university courses• New conferences

– ACM SenSys, IEEE IPSN, etc.

• New industrial research lab projects– Intel, PARC, MSR, HP, Accenture, etc.

• Startup companies– Crossbow, Dust, Ember, Sensicast, Moteiv, etc.

• Media Buzz– Over 30 news articles since July 2002 covering Intel-

Berkeley/UC Berkeley sensor network activities– One of 10 emerging technologies that will change

the world – MIT Technology Review

Page 10: Database Middleware for Sensor Networks

11

Why Now?

• Commoditization of radio hardware– Cellular and cordless phones, wireless

communication

• Low cost -> many/tiny -> new applications!

• Real application for ad-hoc network research from the late 90’s

• Coming together of EE + CS communities

Page 11: Database Middleware for Sensor Networks

12

MotesuProc: 4Mhz, 8 bit Atmel RISCRadio: 40 kbit 900/450/300 MHz or 250 kbit 2.5GHz (MicaZ 802.15.4)Memory:4 K RAM / 128 K Program Flash / 512 K Data FlashPower: 2 x AA or coin cell

Mica MoteMica Mote

Mica2DotMica2Dot

uProc: 8Mhz, 16 bit TI RISCRadio: 250 kbit 2.5GHz (802.15.4)Memory:2 K RAM / 60 K Program Flash / 512 K Data FlashPower: 2 x AA

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Telos MoteTelos Mote iMoteiMote

uProc: 12Mhz, 16 bit ARMRadio: BluetoothMemory:64K SRAM / 512 K Data FlashPower: 2 x AA

Page 12: Database Middleware for Sensor Networks

13

History of Motes

• Initial research goal wasn’t hardware– Has since become more of a priority with emerging

hardware needs, e.g.:• Power consumption• (Ultrasonic) ranging + localization

– MIT Cricket, NEST Project• Connectivity with diverse sensors

– UCLA sensor board

– Even so, now on the 5th generation of devices• Costs down to ~$50/node (Moteiv, Dust)• Greatly improved radio quality• Multitude of interfaces: USB, Ethernet, CF, etc.• Variety of form factors, packages

Page 13: Database Middleware for Sensor Networks

14

Motes vs. Traditional Computing

• Embedded OS• Lossy, Adhoc Radio

Communication• Sensing Hardware• Severe Power Constraints

Page 14: Database Middleware for Sensor Networks

• NesC: a C dialect for embedded programming– Components,

“wired together”– Quick commands

and asynch events

Think of the pair as a programming environment

NesC/TinyOS

• TinyOS: a set of NesC components– hardware

components– ad-hoc network

formation & maintenance

– time synchronization

Page 15: Database Middleware for Sensor Networks

16

Radio Communication

• Low Bandwidth Shared Radio Channel– ~40kBits on motes– Much less in practice

• Encoding, Contention for Media Access (MAC)

• Very lossy: 30% base loss rate– Argues against TCP-like end-to-end

retransmission• And for link-layer retries

• Generally, not well behaved

From Ganesan, et al. “Complex Behavior at Scale.” UCLA/CSD-TR 02-0013

Page 16: Database Middleware for Sensor Networks

17

Types of Sensors

• Sensors attach via daughtercard

•Weather–Temperature–Light x 2 (high intensity PAR, low intensity, full spectrum)–Air Pressure–Humidity

•Vibration–2 or 3 axis accelerometers

•Tracking–Microphone (for ranging and acoustic signatures)–Magnetometer

• GPS• RFID Reader

Page 17: Database Middleware for Sensor Networks

18

Non-Volatile Storage

• EEPROM– 512K off chip, 32K on chip– Writes at disk speeds, reads at RAM speeds– Interface : random access, read/write 256 byte pages– Maximum throughput ~10Kbytes / second

• MatchBox Filing System– Provides a Unix-like file I/O interface– Single, flat directory– Only one file being read/written at a time

Page 18: Database Middleware for Sensor Networks

19

Power Consumption and Lifetime

• Power typically supplied by a small battery– 1000-2000 mAH– 1 mAH = 1 milliamp current for 1 hour

• Typically at optimum voltage, current drain rates– Power = Watts (W) = Amps (A) * Volts (V)– Energy = Joules (J) = W * time

• Lifetime, power consumption varies by application– Processor: 5mA active, 1 mA idle, 5 uA sleeping– Radio: 5 mA listen, 10 mA xmit/receive, ~20mS / packet– Sensors: 1 uA -> 100’s mA, 1 uS -> 1 S / sample

Page 19: Database Middleware for Sensor Networks

20

• Each mote collects 1 sample of (light,humidity) data every 10 seconds, forwards it

• Each mote can “hear” 10 other motes• Process:

– Wake up, collect samples (~ 1 second)– Listen to radio for messages to forward (~1 second)– Forward data

Power Consumption Breakdown

0

10

20

30

40

50

60

70

80

90

Radio Sensors Processor

Hardware Element

Percentage of Total Power

Energy Usage in A Typical Data Collection Scenario

Processor Energy Breakdown

05

101520253035404550

Idle Waiting

for Radio

Waiting

for

Sensors

Sending

Processing Phase

Percentage of Total Energy

Page 20: Database Middleware for Sensor Networks

21

Sensors: Slow, Power Hungry, Noisy

Time of Day vs. Light

-20

0

20

40

60

80

100

120

140

160

180

200

20:09 20:38 21:07 21:36 22:04 22:33 23:02 23:31 0:00 0:28 0:57 1:26

Time of Day

Lux

Chamber Sensor

Sensor 69

Time of Day vs. Light

-20

0

20

40

60

80

100

120

140

160

180

200

20:09 20:38 21:07 21:36 22:04 22:33 23:02 23:31 0:00 0:28 0:57 1:26

Time

Light (Lux)

Chamber Sensor

Sensor 69 (Median of Last 10)

Page 21: Database Middleware for Sensor Networks

22

TinyOS: Getting Started

• The TinyOS home page:– http://webs.cs.berkeley.edu/tinyos– Start with the tutorials!

• The CVS repository– http://sf.net/projects/tinyos

• The NesC Project Page– http://sf.net/projects/nescc

• Crossbow motes (hardware):– http://www.xbow.com

• Intel Imote– www.intel.com/research/exploratory/motes.htm.

Page 22: Database Middleware for Sensor Networks

23

Part 2

The Design and Implementation of TinyDB

Page 23: Database Middleware for Sensor Networks

24

Part 2 Outline

• TinyDB Overview• Data Model and Query Language• TinyDB Java API and Scripting• Demo with TinyDB GUI• TinyDB Internals• Extending TinyDB• TinyDB Status and Roadmap

Page 24: Database Middleware for Sensor Networks

25

TinyDB RevisitedSELECT MAX(mag) FROM sensors WHERE mag > threshSAMPLE PERIOD 64ms

• High level abstraction:– Data centric programming– Interact with sensor

network as a whole– Extensible framework

• Under the hood:– Intelligent query

processing: query optimization, power efficient execution

– Fault Mitigation: automatically introduce redundancy, avoid problem areas

App

Sensor Network

TinyDB

Query, Trigger

Data

Page 25: Database Middleware for Sensor Networks

26

Feature Overview

• Declarative SQL-like query interface• Metadata catalog management• Multiple concurrent queries• Network monitoring (via queries)• In-network, distributed query processing• Extensible framework for attributes,

commands and aggregates• In-network, persistent storage

Page 26: Database Middleware for Sensor Networks

27

TinyDB GUI

TinyDB Client APIDBMS

Sensor network

Architecture

TinyDB query processor

0

4

0

1

5

2

6

3

7

JDBC

Mote side

PC side

8

Page 27: Database Middleware for Sensor Networks

28

Data Model

• Entire sensor network as one single, infinitely-long logical table: sensors

• Columns consist of all the attributes defined in the network

• Typical attributes:– Sensor readings– Meta-data: node id, location, etc.– Internal states: routing tree parent, timestamp, queue

length, etc.• Nodes return NULL for unknown attributes• On server, all attributes are defined in catalog.xml• Discussion: other alternative data models?

Page 28: Database Middleware for Sensor Networks

29

Query Language (TinySQL)

SELECT <aggregates>, <attributes>

[FROM {sensors | <buffer>}][WHERE <predicates>][GROUP BY <exprs>][SAMPLE PERIOD <const> |

ONCE][INTO <buffer>][TRIGGER ACTION <command>]

Page 29: Database Middleware for Sensor Networks

30

Comparison with SQL

• Single table in FROM clause• Only conjunctive comparison predicates

in WHERE and HAVING• No subqueries• No column alias in SELECT clause• Arithmetic expressions limited to

column op constant• Only fundamental difference: SAMPLE

PERIOD clause

Page 30: Database Middleware for Sensor Networks

31

TinySQL Examples

SELECT nodeid, nestNo, lightFROM sensorsWHERE light > 400EPOCH DURATION 1s

1EpocEpoc

hhNodeiNodei

ddnestNnestN

ooLightLight

0 1 17 455

0 2 25 389

1 1 17 422

1 2 25 405

Sensors

“Find the sensors in bright nests.”

Page 31: Database Middleware for Sensor Networks

32

TinySQL Examples (cont.)

Epoch region CNT(…) AVG(…)

0 North 3 360

0 South 3 520

1 North 3 370

1 South 3 520

“Count the number occupied nests in each loud region of the island.”

SELECT region, CNT(occupied) AVG(sound)

FROM sensors

GROUP BY region

HAVING AVG(sound) > 200

EPOCH DURATION 10s

3

Regions w/ AVG(sound) > 200

SELECT AVG(sound)

FROM sensors

EPOCH DURATION 10s

2

Page 32: Database Middleware for Sensor Networks

33

Event-based Queries

• ON event SELECT …• Run query only when interesting events

happens• Event examples

– Button pushed– Message arrival– Bird enters nest

• Analogous to triggers but events are user-defined

Page 33: Database Middleware for Sensor Networks

34

Query over Stored Data

• Named buffers in Flash memory• Store query results in buffers• Query over named buffers• Analogous to materialized views• Example:

– CREATE BUFFER name SIZE x (field1 type1, field2 type2, …)

– SELECT a1, a2 FROM sensors SAMPLE PERIOD d INTO name

– SELECT field1, field2, … FROM name SAMPLE PERIOD d

Page 34: Database Middleware for Sensor Networks

35

Using the Java API

• SensorQueryer– translateQuery() converts TinySQL string into

TinyDBQuery object– Static query optimization

• TinyDBNetwork– sendQuery() injects query into network– abortQuery() stops a running query– addResultListener() adds a ResultListener that is

invoked for every QueryResult received– removeResultListener()

• QueryResult– A complete result tuple, or– A partial aggregate result, call mergeQueryResult()

to combine partial results• Key difference from JDBC: push vs. pull

Page 35: Database Middleware for Sensor Networks

36

Writing Scripts with TinyDB

• TinyDB’s text interface– java net.tinyos.tinydb.TinyDBMain –

run “select …”– Query results printed out to the

console– All motes get reset each time new

query is posed• Handy for writing scripts with shell,

perl, etc.

Page 36: Database Middleware for Sensor Networks

37

Using the GUI Tools

• Demo time

Page 37: Database Middleware for Sensor Networks

38

Inside TinyDB

TinyOS

Schema

Query Processor

Multihop Network

Filterlight >

400get (‘temp’)

Aggavg(tem

p)

QueriesSELECT AVG(temp) WHERE light > 400

ResultsT:1, AVG: 225T:2, AVG: 250

Tables Samples got(‘temp’)

Name: tempTime to sample: 50 uSCost to sample: 90 uJCalibration Table: 3Units: Deg. FError: ± 5 Deg FGet f : getTempFunc()…

getTempFunc(…)getTempFunc(…)

TinyDBTinyDB

~10,000 Lines Embedded C Code

~5,000 Lines (PC-Side) Java

~3200 Bytes RAM (w/ 768 byte heap)

~58 kB compiled code

(3x larger than 2nd largest TinyOS Program)

Page 38: Database Middleware for Sensor Networks

39

Tree-based Routing

• Tree-based routing– Used in:

• Query delivery • Data collection• In-network aggregation

– Relationship to indexing?

A

B C

D

FE

Q:SELECT …

Q Q

Q

QQ

Q

Q

Q

Q

Q QQ

R:{…}

R:{…}

R:{…}

R:{…} R:{…}

Page 39: Database Middleware for Sensor Networks

40

Sensor A

Time

Curre

nt

Sensor B

Power Consumption and Lifetime

• Power typically supplied by a small battery– At full power, device will last 2-3 days -> Critical Constraint

• Lifetime, power consumption varies by application– Scales with “duty cycle” : amount of time on– Low data rate (< 1 sample / 30 secs) : > 6 months possible from AA

batteries

Sleeping

Radio On, Processing

TransmittingFundamental challenge: distributed coordination with low

power!

Sensor B

Must Synchronize

!

Page 40: Database Middleware for Sensor Networks

41

Time Synchronization

• All messages include a 5 byte time stamp indicating system time in ms– Synchronize (e.g. set system time to timestamp) with

• Any message from parent• Any new query message (even if not from parent)

– Punt on multiple queries– Timestamps written just after preamble is xmitted

• All nodes agree that the waking period begins when (system time % epoch dur = 0)– And lasts for WAKING_PERIOD ms

• Adjustment of clock happens by changing duration of sleep cycle, not wake cycle.

Page 41: Database Middleware for Sensor Networks

42

Extending TinyDB

• Why extending TinyDB?– New sensors attributes– New control/actuation commands– New data processing logic

aggregates– New events

• Analogous to concepts in object-relational databases

Page 42: Database Middleware for Sensor Networks

43

Adding Attributes

• Types of attributes– Sensor attributes: raw or cooked

sensor readings– Introspective attributes: parent,

voltage, ram usage, etc.– Constant attributes: constant values

that can be statically or dynamically assigned to a mote, e.g., nodeid, location, etc.

Page 43: Database Middleware for Sensor Networks

44

Adding Attributes (cont)

• Interfaces provided by Attr component– StdControl: init, start, stop– AttrRegister

• command registerAttr(name, type, len)• event getAttr(name, resultBuf, errorPtr)• event setAttr(name, val)• command getAttrDone(name, resultBuf, error)

– AttrUse• command startAttr(attr)• event startAttrDone(attr)• command getAttrValue(name, resultBuf, errorPtr)• event getAttrDone(name, resultBuf, error)• command setAttrValue(name, val)

Page 44: Database Middleware for Sensor Networks

45

Adding Attributes (cont)

• Steps to adding attributes to TinyDB1) Create attribute nesC components2) Wire new attribute components to

TinyDBAttr configuration 3) Reprogram TinyDB motes4) Add new attribute entries to catalog.xml

• Constant attributes can be added on the fly through TinyDB GUI

Page 45: Database Middleware for Sensor Networks

46

Adding Aggregates

• Step 1: wire new nesC components

Page 46: Database Middleware for Sensor Networks

47

Adding Aggregates (cont)

• Step 2: add entry to catalog.xml<aggregate>

<name>AVG</name><id>5</id><temporal>false</temporal><readerClass>net.tinyos.tinydb.AverageClass</readerClass>

</aggregate>

• Step 3 (optional): implement reader class in Java– a reader class interprets and finalizes aggregate

state received from the mote network, returns final result as a string for display.

Page 47: Database Middleware for Sensor Networks

48

TinyDB Status

• Latest released with TinyOS 1.1 (9/03)– Install the task-tinydb package in TinyOS 1.1

distribution– First release in TinyOS 1.0 (9/02)– Widely used by research groups as well as industry pilot

projects

• Successful deployments in Intel Berkeley Lab and redwood trees at UC Botanical Garden– Largest deployment: ~80 weather station nodes– Network longevity: 4-5 months

Page 48: Database Middleware for Sensor Networks

49

The Redwood Tree Deployment

• Redwood Grove in UC Botanical Garden, Berkeley

• Collect dense sensor readings to monitor climatic variations across– altitudes,– angles,– time,– forest locations, etc.

• Versus sporadic monitoring points with 30lb loggers!

• Current focus: study how dense sensor data affect predictions of conventional tree-growth models

Page 49: Database Middleware for Sensor Networks

50

Humidity vs. Time

35

45

55

65

75

85

95

Rel Humidity (%)

101 104 109 110 111

Data from Redwoods

36m

33m: 111

32m: 110

30m: 109,108,107

20m: 106,105,104

10m: 103, 102, 101

Temperature vs. Time

8

13

18

23

28

33

7/7/039:40

7/7/0313:11

7/7/0316:43

7/7/0320:15

7/7/0323:46

7/8/033:18

7/8/036:50

7/8/0310:21

7/8/0313:53

7/8/0317:25

7/8/0320:56

7/9/030:28

7/9/034:00

7/9/037:31

7/9/0311:03

Date

Temperature (C)

Page 50: Database Middleware for Sensor Networks

51

TASK

Page 51: Database Middleware for Sensor Networks

52

A SensorNet Dilemma

• Sensors still packaged like HeathKits– Pretty hard to cope with out of the box

• Bare metal encourages one-off applications– Inhibits reuse

• Deployment not intuitive– No configuration/monitoring tools

• SensorNet PhD Factor– Today ~2.5 PhDs needed to deploy a

SensorNet– Needs to be Zero

Page 52: Database Middleware for Sensor Networks

53

TASK Design Requirements

• Ease of S/W Installation• Deployment tools• Reconfigurability• Health/Mgmt Monitoring• Network Reliability

Guarantee• Interpretable Sensor

Results• Tool Integration

• Audit Trails• Lifetime estimates

• Familiar API• Extensibility of S/W• Modular services

~ For Developers ~

Page 53: Database Middleware for Sensor Networks

54

Tiny Application Sensor Kit

TASK Field Tools

Stable Store(DBMS)

TASK Client Tools

TASK ServerSensorNet Appliance

External Tools

TinyDB Sensor Network

Internet

TaskView

• Simplicity vs. Functionality• Modularity• Remote control• Fault Tolerant

Page 54: Database Middleware for Sensor Networks

55

SensorNet Appliance

• Intelligent Gateway– Proxy for the sensornet– Distributes query– Stages results– Manages configuration

• Components– TASK Server– TinyDB Client (Java)– DBMS (PostgreSQL)– WebServer (Apache)

TinyDB Client

DBMS

TASKServer

SNAhttp, other

ODBC

SensorNet

Page 55: Database Middleware for Sensor Networks

56

Tools

• Field Tool– In-situ diagnostics

• TaskView– Integrated tool for

management and monitoring

Page 56: Database Middleware for Sensor Networks

57

For more information

• http://triplerock.cs.bekeley.edu/tinydb

Page 57: Database Middleware for Sensor Networks

58

Part 3

Middleware Architecture and Research Topics

Page 58: Database Middleware for Sensor Networks

59

Architectural Overview

Stable Store(DBMS)

Field Tools

Local Servers

Internet

Client Tools GUIs,etcExternal Tools

Sensor Network

TinyDB

Middleware

Page 59: Database Middleware for Sensor Networks

60

What’s Left?

• TinyDB and TinyOS provide a reasonable low-level substrate

• TASK sufficient for many data collection apps• But… there are other architecture issues

– Efficiency concerns• Currently transmit readings from all sensors on each

epoch• Variable, context sensitive rates…

– Data quality issues• Missing and faulty sensors?

– Architectural issues• Actuation / closed loop issues stuff• Disconnection, etc.

Page 60: Database Middleware for Sensor Networks

61

Sensor Network Research

• Very active research area– Can’t summarize it all

• Focus: database-relevant research topics– Some outside of Berkeley– Other topics that are itching to be scratched– But, some bias towards work that we find

compelling

Page 61: Database Middleware for Sensor Networks

62

Topics

• Improving TinyDB Efficiency– In-network aggregation– Acquisitional Query Processing

• Alternative Architectures– Statistical Techniques

– Heterogeneity– Intermittent Connectivity

• New features– In-network storage– Closing the loop– Integration with traditional databases

Page 62: Database Middleware for Sensor Networks

63

Topics

• Improving TinyDB Efficiency– In-network aggregation– Acquisitional Query Processing

• Alternative Architectures– Statistical Techniques– Heterogeneity– Intermittent Connectivity

• New features– In-network storage– Closing the loop– Integration with traditional databases

Page 63: Database Middleware for Sensor Networks

64

Tiny Aggregation (TAG)

• In-network processing of aggregates– Common data analysis operation

• Aka gather operation or reduction in || programming

– Communication reducing• Operator dependent benefit

– Across nodes during same epoch

• Exploit query semantics to improve efficiency!

Madden, Franklin, Hellerstein, Hong. Tiny AGgregation (TAG), OSDI 2002.

Page 64: Database Middleware for Sensor Networks

65

Basic Aggregation

• In each epoch:– Each node samples local sensors once– Generates partial state record (PSR)

• local readings • readings from children

– Outputs PSR during assigned comm. interval• Interval assigned based on depth in tree

1

2 3

4

5 Interval 1

2

33

4

• At end of epoch, PSR for whole network output at root

• New result on each successive epoch

Page 65: Database Middleware for Sensor Networks

66

Illustration: In-Network Aggregation

1 2 3 4 5

4 1

3

2

1

4

1

2 3

4

5

1

Sensor #

Inte

rval #

Interval 4SELECT COUNT(*) FROM sensors

Sample Period

Time

Page 66: Database Middleware for Sensor Networks

67

Illustration: In-Network Aggregation

1 2 3 4 5

4 1

3 2

2

1

4

1

2 3

4

5

2

Sensor #

Interval 3SELECT COUNT(*) FROM sensors

Inte

rval #

Page 67: Database Middleware for Sensor Networks

68

Illustration: In-Network Aggregation

1 2 3 4 5

4 1

3 2

2 1 3

1

4

1

2 3

4

5

31

Sensor #

Interval 2SELECT COUNT(*) FROM sensors

Inte

rval #

Page 68: Database Middleware for Sensor Networks

69

Illustration: In-Network Aggregation

1 2 3 4 5

4 1

3 2

2 1 3

1 5

4

1

2 3

4

5

5

Sensor #

SELECT COUNT(*) FROM sensors Interval 1

Inte

rval #

Page 69: Database Middleware for Sensor Networks

70

Illustration: In-Network Aggregation

1 2 3 4 5

4 1

3 2

2 1 3

1 5

4 1

1

2 3

4

5

1

Sensor #

SELECT COUNT(*) FROM sensors Interval 4

Inte

rval #

Page 70: Database Middleware for Sensor Networks

71

Illustration: In-Network Aggregation

1 2 3 4 5

4 zzz zzz zzz 1

3 zzz zzz 2 zzz

2 1 3 zzz zzz

1 5 zzz zzz zzz zzz

4 zzz zzz zzz 1

1

2 3

4

5

1

Sensor #

SELECT COUNT(*) FROM sensors Interval 4

Inte

rval #

Page 71: Database Middleware for Sensor Networks

72

Aggregation Framework

• As in extensible databases, TinyDB supports any aggregation function conforming to:

Aggn={finit, fmerge, fevaluate}

Finit {a0} <a0>

Fmerge {<a1>,<a2>} <a12>

Fevaluate {<a1>} aggregate value

Example: AverageAVGinit {v} <v,1>

AVGmerge {<S1, C1>, <S2, C2>} < S1 + S2 , C1 + C2>

AVGevaluate{<S, C>} S/C

Partial State Record (PSR)

Restriction: Merge associative, commutative

Page 72: Database Middleware for Sensor Networks

73

Property Examples Affects

Partial State MEDIAN : unbounded, MAX : 1 record

Effectiveness of TAG

Monotonicity COUNT : monotonicAVG : non-monotonic

Hypothesis Testing, Snooping

Exemplary vs. Summary

MAX : exemplaryCOUNT: summary

Applicability of Sampling, Effect of Loss

Duplicate Sensitivity

MIN : dup. insensitive,AVG : dup. sensitive

Routing Redundancy

Taxonomy of Aggregates

• TAG insight: classify aggregates according to various functional properties– Yields a general set of optimizations that can automatically be

applied

Drives an API!

Page 73: Database Middleware for Sensor Networks

74

Use Multiple Parents

• Use graph structure – Increase delivery probability with no communication

overhead

• For duplicate insensitive aggregates, or• Aggs expressible as sum of parts

– Send (part of) aggregate to all parents• In just one message, via multicast

– Assuming independence, decreases variance

SELECT COUNT(*)

A

B C

R

A

B C

c

R

P(link xmit successful) = p

P(success from A->R) = p2

E(cnt) = c * p2

Var(cnt) = c2 * p2 * (1 – p2) V

# of parents = n

E(cnt) = n * (c/n * p2)

Var(cnt) = n * (c/n)2 * p2 * (1 – p2) = V/n

A

B C

c/n c/n

R

n = 2

Page 74: Database Middleware for Sensor Networks

75

Multiple Parents Results

• Better than previous analysis expected!

• Losses aren’t independent!

• Insight: spreads data over many links

Benefit of Result Splitting (COUNT query)

0

200

400

600

800

1000

1200

1400

(2500 nodes, lossy radio model, 6 parents per node)

Avg. COUNT

Splitting

No Splitting

Critical Link!

No Splitting With Splitting

Page 75: Database Middleware for Sensor Networks

76

Acquisitional Query Processing (ACQP)

• TinyDB acquires AND processes data

– Could generate an infinite number of samples

• An acqusitional query processor controls

– when,

– where,

– and with what frequency data is collected!

• Versus traditional systems where data is provided a priori

Madden, Franklin, Hellerstein, and Hong. The Design of An Acqusitional Query Processor. SIGMOD, 2003.

Page 76: Database Middleware for Sensor Networks

77

ACQP: What’s Different?• How should the query be processed?

– Sampling as a first class operation

• How does the user control acquisition?– Rates or lifetimes– Event-based triggers

• Which nodes have relevant data?– Index-like data structures

• Which samples should be transmitted?– Prioritization, summary, and rate control

Page 77: Database Middleware for Sensor Networks

78

• E(sampling mag) >> E(sampling light)

1500 uJ vs. 90 uJ

Operator Ordering: Interleave Sampling + Selection

SELECT light, magFROM sensorsWHERE pred1(mag)AND pred2(light)EPOCH DURATION 1s

(pred1)

(pred2)

mag

light

(pred1)

(pred2)

mag

light

(pred1)

(pred2)

mag light

Traditional DBMS

ACQP

At 1 sample / sec, total power savings could be as much as 3.5mW Comparable to processor!

Correct orderingCorrect ordering(unless pred1 is (unless pred1 is very very selective selective

and pred2 is not):and pred2 is not):

Cheap

Costly

Page 78: Database Middleware for Sensor Networks

79

Exemplary Aggregate Pushdown

SELECT WINMAX(light,8s,8s)FROM sensorsWHERE mag > xEPOCH DURATION 1s

• Novel, general pushdown technique

• Mag sampling is the most expensive operation!

WINMAX

(mag>x)

mag light

Traditional DBMS

light

mag

(mag>x)

WINMAX

(light > MAX)

ACQP

Page 79: Database Middleware for Sensor Networks

80

Topics

• Improving TinyDB Efficiency– In-network aggregation– Acquisitional Query Processing

• Alternative Architectures– Statistical Techniques– Heterogeneity– Intermittent Connectivity

• New features– In-network storage– Closing the loop– Integration with traditional databases

Page 80: Database Middleware for Sensor Networks

81

Statistical Techniques

• Approximations, summaries, and sampling based on statistics and statistical models

• Applications:– Limited bandwidth and large number of nodes -

> data reduction– Lossiness -> predictive modeling– Uncertainty -> tracking correlations and

changes over time– Physical models -> improved query answering

Page 81: Database Middleware for Sensor Networks

82

Every time step

TinyDB Retrospective

TinyDBQuery

Distributequery

Collectquery answer

or data

SQL-stylequery

Declarative interface: Sensor nets are not just for PhDs Decrease deployment time

Data aggregation: Can reduce communication

Page 82: Database Middleware for Sensor Networks

83

Every time step

Limitations of TinyDB approach

TinyDBQuery

Distributequery

Collectdata

SQL-stylequery

Redoprocesseverytimequery

changesQuery distribution: Every node must receive query

New QueryData collection: Every node must wake up at every time step Data loss ignored No quality guarantees Wastes resources by ignoring correlations

Page 83: Database Middleware for Sensor Networks

84

Sensor net data is correlated

Spatial-temporal correlation

• Data is not i.i.d. shouldn’t ignore missing data

• Observing one sensor information about other sensors (and future values)

• Observing one type of reading information about other local readings

Page 84: Database Middleware for Sensor Networks

8510 20 300

0.1

0.2

0.3

0.4

t - transition model

SQL-style query

with desired confidence

BBQ: Model-driven data acquisition

Probabilistic Model

10 20 300

0.1

0.2

0.3

0.4

Query

Data gathering

plan

Conditionon new

observations

Example model: Multidimensional

Gaussian

10 20 300

0.1

0.2

0.3

0.4

posterior belief

Strengths of model-based data acquisition Observe fewer attributes Exploit correlations Reuse information between queries Directly deal with missing data Answer more complex (probabilistic) queries

New QueryMiddleware Layer

Page 85: Database Middleware for Sensor Networks

86

Probabilistic models and queries

User’s perspective:QuerySELECT nodeId, temp ± 0.5°C, conf(.95) FROM sensorsWHERE nodeId in {1..8}

System selects and observes subset of nodesObserved nodes: {3,6,8}

Query result

Node 1 2 3 4 5 6 7 8

Temp. 17.3

18.1 17.4 16.1 19.2 21.3 17.5 16.3

Conf. 98%

95% 100% 99% 95% 100% 98% 100%

10 20 300

0.1

0.2

0.3

0.4

1.0°C

Page 86: Database Middleware for Sensor Networks

87

Supported queries

• Value query– Xi ± with prob. at least 1-

• SELECT and Range query– Xi[a,b] with prob. at least 1-– which sensors have temperature greater than

25°C ?

• Aggregation– average ± of subset of attribs. with prob. > 1-– combine aggregation and selection– probability > 10 sensors have temperature

greater than 25°C ?

Queries require solution to integrals Many queries computed in closed-form Some require numerical integration/sampling

Page 87: Database Middleware for Sensor Networks

88

Experimental results

• Redwood trees and Intel Lab datasets• Learned models from data

– Static model– Dynamic model – Kalman filter, time-indexed transition

probabilities

• Evaluated on a wide range of queries

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

Page 88: Database Middleware for Sensor Networks

89

Cost versus Confidence level

Page 89: Database Middleware for Sensor Networks

90

Obtaining approximate values

Query: True temperature value ± epsilon with confidence 95%

Page 90: Database Middleware for Sensor Networks

91

–E.g., if we can characterize failure modes, we can discard them

• Applying well known probabilistic techniques to allow TinyDB to deal with such issues.

Next Step : Outliers and Unusual Events

• Once we have a model of the expected behavior, we can:– Detect unusual (low probability) events– Predict missing values

• Often, there are several “expected” behavior modes, which we want to differentiate betweenAC ON

AC OFF

ON

OFF

Page 91: Database Middleware for Sensor Networks

92

IDSQ

• Similar idea: suppose you want to e.g., localize a vehicle in a field of sensors

• Idea: task sensors in order of best improvement to estimate of some value:– Choose leader(s)

• Suppress subordinates• Task subordinates, one at a time

– Until some measure of goodness (error bound) is met

See “Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks.” Chu, Haussecker and Zhao. Xerox TR P2001-10113. May, 2001.

Page 92: Database Middleware for Sensor Networks

93

Model location estimate as a point with 2-dimensional Gaussian uncertainty.

Graphical Representation

Principal Axis

S1

Residual 1

Preferred because it reduces error along principal axis

Residual 2 S2

Area of residuals is equal

Page 93: Database Middleware for Sensor Networks

94

Lots of Other Work with of This Flavor

• Precision / Energy Tradeoff -- Want nodes to sleep except when their data is needed– Olston et al. Approximate Caching. SIGMOD

‘03.– Cheng et al. Kalman Filters. SIGMOD ‘04.- Lazaridis and Mehrotra. Approximate Selection

Queries over Imprecise Data. ICDE 2004.- UCI Quasar Project

- Timeliness + Real Time Constraints• John A. Stankovic etl al. Real Time Communication and

Coordination in Sensor Networks. Proceedings of the IEEE, 91(7), July 2003.

• Tian He et al. SPEED: a stateless protocol (ICDCS’03)

Page 94: Database Middleware for Sensor Networks

95

In-Net Regression

• Linear regression : simple way to predict future values, identify outliers

• Regression can be across local or remote values, multiple dimensions, or with high degree polynomials– E.g., node A readings vs. node B’s– Or, location (X,Y), versus temperature

E.g., over many nodes

X vs Y w/ Curve Fit

y = 0.9703x - 0.0067

R2 = 0.947

0

2

4

6

8

10

12

1 3 5 7 9Guestrin, Thibaux, Bodik, Paskin, Madden. “Distributed Regression: an Efficient

Framework for Modeling Sensor Network Data .” Under submission.

Page 95: Database Middleware for Sensor Networks

96

In-Net Regression (Continued)

• Problem: may require data from all sensors to build model

• Solution: partition sensors into overlapping “kernels” that influence each other– Run regression in each kernel

• Requiring just local communication

– Blend data between kernels– Requires some clever matrix manipulation

• End result: regressed model at every node– Useful in failure detection, missing value

estimation

Page 96: Database Middleware for Sensor Networks

97

Topics

• Improving TinyDB Efficiency– In-network aggregation– Acquisitional Query Processing

• Alternative Architectures– Statistical Techniques– Heterogeneity– Intermittent Connectivity

• New features– In-network storage– Closing the loop– Integration with traditional databases

Page 97: Database Middleware for Sensor Networks

98

Heterogeneous Sensor Networks

• Leverage small numbers of high-end nodes to benefit large numbers of inexpensive nodes

• Still must be transparent and ad-hoc• Key to scalability of sensor networks• Interesting heterogeneities

– Energy: battery vs. outlet power– Link bandwidth: Chipcon vs. 802.11x– Computing and storage: ATMega128 vs.

Xscale– Pre-computed results– Sensing nodes vs. QP nodes

Page 98: Database Middleware for Sensor Networks

99

Computing Heterogeneity with TinyDB

• Separate query processing from sensing– Provide query processing on a small number of nodes– Attract packets to query processors based on “service

value”• Compare the total energy consumption of the

network

• No aggregation• All aggregation• Opportunistic aggregation• HSN proactive

aggregation

Mark Yarvis and York Liu, Intel’s Heterogeneous Sensor

Network Project, ftp://download.intel.com/research/people/HSN_IR_Day_Poster_03.pdf.

Page 99: Database Middleware for Sensor Networks

100

5x7 TinyDB/HSN Mica2 Testbed

Page 100: Database Middleware for Sensor Networks

101

Data Packet SavingData Packet Saving

-50.00%

-45.00%

-40.00%

-35.00%

-30.00%

-25.00%

-20.00%

-15.00%

-10.00%

-5.00%

0.00%

1 2 3 4 5 6 All (35)

Number of Aggregator

% Change in Data Packet Count

Data Packet Saving - Aggregator Placement

-50.00%

-45.00%

-40.00%

-35.00%

-30.00%

-25.00%

-20.00%

-15.00%

-10.00%

-5.00%

0.00%

25 27 29 31 All (35)

Aggregator Location

% Change in Data Packet Counnt

• How many aggregators are desired?

• Does placement matter?

11% aggregators achieve 72% of max

data reduction

Optimal placement 2/3 distance from sink.

Page 101: Database Middleware for Sensor Networks

102

Topics

• Improving TinyDB Efficiency– In-network aggregation– Acquisitional Query Processing

• Alternative Architectures– Statistical Techniques– Heterogeneity– Intermittent Connectivity

• New features– In-network storage– Closing the loop– Integration with traditional databases

Page 102: Database Middleware for Sensor Networks

103

Occasionally Connected Sensornets

TinyDB QP

TinyDB QP

TinyDB QP

TinyDB Server

GTWY

Mobile GTWY

Mobile GTWYMobile GTWY

GTWYinternet

Page 103: Database Middleware for Sensor Networks

104

Occasionally Connected Sensornets Challenges

• Networking support– Tradeoff between reliability, power

consumption and delay– Data custody transfer: duplicates?– Load shedding– Routing of mobile gateways

• Query processing– Operation placement: in-network vs. on mobile

gateways– Proactive pre-computation and data movement

• Tight interaction between networking and QP

Fall, Hong and Madden, Custody Transfer for Reliable Delivery in Delay Tolerant Networks, http://www.intel-research.net/Publications/Berkeley/081220030852_157.pdf.

Page 104: Database Middleware for Sensor Networks

105

Other Occasionally Connected Work

• Kevin Fall. Delay Tolerant Networks. SIGCOMM 2003.

• Juang et al. Enery efficient computing for wildlife tracking. ASPLOS 2002.

• Li et al. Sending messages to mobile users in disconnected ad-hoc wireless networks. MOBICOM 2000.

• Shah et al. Data Mules. SNPA 2003.

Page 105: Database Middleware for Sensor Networks

106

Topics

• Improving TinyDB Efficiency– In-network aggregation– Acquisitional Query Processing

• Alternative Architectures– Statistical Techniques– Heterogeneity– Intermittent Connectivity

• New features– In-network storage– Closing the loop– Integration with traditional databases

Page 106: Database Middleware for Sensor Networks

107

Distributed In-network Storage

• Collectively, sensornets have large amounts of in-network storage

• Good for in-network consumption or caching

• Challenges– Distributed indexing for fast query

dissemination– Resilience to node or link failures– Graceful adaptation to data skews– Minimizing index insertion/maintenance cost

Page 107: Database Middleware for Sensor Networks

108

Example: DIM• Functionality

– Efficient range query for multidimensional data.

• Approaches– Divide sensor field into

bins.– Locality preserving

mapping from m-d space to geographic locations.

– Use geographic routing such as GPSR.

• Assumptions– Nodes know their

locations and network boundary

– No node mobility

E2= <0.6, 0.7>E1 = <0.7, 0.8>

Q1=<.5-.7, .5-1>

Xin Li, Young Jin Kim, Ramesh Govindan and Wei Hong, Distributed Index for Multi-dimentional Data (DIM) in Sensor Networks, SenSys 2003.

Page 108: Database Middleware for Sensor Networks

109

Topics

• Improving TinyDB Efficiency– In-network aggregation– Acquisitional Query Processing

• Alternative Architectures– Statistical Techniques– Heterogeneity– Intermittent Connectivity

• New features– In-network storage– Closing the loop– Integration with traditional databases

Page 109: Database Middleware for Sensor Networks

110

Closing the Loop

• Challenge: want more than data collection– Condition-based sensing, rate adjustment– Condition-based actuation

• E.g.,– Kansal et al. Sensor Uncertainty Reduction Using Low

Complexity Actuation. IPSN 2004. – work from Qiong Luo HKUST et al in CIDR.– Various process control systems: ladder logic,

SCADA, etc.

• Questions:– Appropriate languages– Resource contention on actuators– Closed-loop safety concerns

Page 110: Database Middleware for Sensor Networks

111

Topics

• Improving TinyDB Efficiency– In-network aggregation– Acquisitional Query Processing

• Alternative Architectures– Statistical Techniques– Heterogeneity– Intermittent Connectivity

• New features– In-network storage– Closing the loop– Integration with traditional databases

Page 111: Database Middleware for Sensor Networks

112

Alternative Middleware: Integration into an

Existing DBMS

Page 112: Database Middleware for Sensor Networks

113

Concluding Remarks

• Sensor networks are an exciting emerging technology, with a wide variety of applications

• Many research challenges in all areas of computer science– Database community included– Some agreement that a declarative interface is right

• TinyDB and other early work are an important first step

• But there’s lots more to be done!– Real challenge is building appropriate middleware abstractions

Page 113: Database Middleware for Sensor Networks

114

Questions?

http://db.lcs.mit.edu/madden/middleware_tutorial.ppt

Page 114: Database Middleware for Sensor Networks

115

In-Network Join Strategies

• Types of joins: – non-sensor -> sensor– sensor -> sensor

• Optimization questions:– Should the join be pushed down?– If so, where should it be placed?– What if a join table exceeds the

memory available on one node?

Page 115: Database Middleware for Sensor Networks

116

Choosing Where to Place Operators

• Idea : choose a “join node” to run the operator

• Over time, explore other candidate placements– Nodes advertise data rates to their neighbors– Neighbors compute expected cost of running the

join based on these rates– Neighbors advertise costs– Current join node selects a new, lower cost node

Bonfils + Bonnet, Adaptive and Decentralized Operator Placement for In-Network QueryProcessing IPSN 2003.

Page 116: Database Middleware for Sensor Networks

117

Topics

• In-network aggregation• Acquisitional Query Processing• Heterogeneity• Intermittent Connectivity• In-network Storage• Statistics-based summarization and

sampling• In-network Joins• Adaptivity and Sensor Networks• Multiple Queries

Page 117: Database Middleware for Sensor Networks

118

Adaptivity In Sensor Networks

• Queries are long running• Selectivities change

– E.g. night vs day

• Network load and available energy vary• All suggest that some adaptivity is needed

– Of data rates or granularity of aggregation when optimizing for lifetimes

– Of operator orderings or placements when selectivities change (c.f., conditional plans for correlations)

• As far as we know, this is an open problem!

Page 118: Database Middleware for Sensor Networks

119

Multiple Queries and Work Sharing

• As sensornets evolve, users will run many queries simultaneously– E.g., traffic monitoring

• Likely that queries will be similar– But have different end points, parameters,

etc

• Would like to share processing, routing as much as possible

• But how? Again, an open problem.