1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan,...

21
1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter: Vijay Sundaram

Transcript of 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan,...

Page 1: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

1

DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks?

Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI)

Presenter: Vijay Sundaram

Page 2: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

2

Deployment: Microclimate monitoring at James Reserve Park (UC Riverside)

How well does data fit model <M> of variation of temperature

with altitude.

Send robotic agent to edge between

low and high precipitation

regions

Get detailed data from node with maximum precipitation

from Sept to Dec 2003

Hmm…I wonder why packet-loss is so high. Get a connectivity map of the network for all transmit power

settings

Weather Sensor Network

Page 3: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

3

Goals Flexible spatio-temporal querying

Provide ability to mine for interesting patterns and features in data.

Drill-down on details Distributed Long-term networked data

storage Preserve ability for long-term data mining, while

catering to node storage constraints Performance

Reasonable Accuracy for wide range of queries Low communication (energy) overhead

Page 4: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

4

How can we achieve goals? Exploit redundancy in data

Potentially huge gains from lossy compression exploiting spatio-temporal correlation

Exploit rarity of interesting features Preserve only interesting features.

Exploit scale of sensor network. large distributed storage, although limited local

storage. Exploit low cost of approximate query

processing allow approximate query processing that obtain

sufficiently accurate responses.

Page 5: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

5

Can existing systems satisfy design goals?

P2P: DHT Gnutella

Data Correlation Vs Decentralization

Non

eT

empo

ral

Spa

tial

Centralized Hierarchical Fully Distributed

WebCaches

CentralizedData

Collection

Geo-SpatialData Mining, Streaming

Media (MPEG-2)

WirelessSensor Networks

Exp

loit

ed

Data

C

orr

ela

tion

Degree of Decentralization

Page 6: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

6

PR

OG

RES

SIV

ELY

AG

E

DIMENSIONS Design: Key Ideas

Construct hierarchy of lossy compressed summaries of data using wavelet compression.

Queries “drill-down” from root of hierarchy to focus search on small portions of the network.

Progressively age lossy data along spatio-temporal hierarchy to enable long-term storage

Level 0

Level 1

Level 2

PR

OG

RES

SIV

ELY

LO

SS

Y

Page 7: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

7

Roadmap Why wavelets? Example Precipitation Hierarchy Spatial and Temporal Processing

internals Initial Results: Precipitation

Dataset

Page 8: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

8

Enabling Technique: Wavelets Very popular signal processing approach,

that provides good time and frequency localization. JPEG2000, Geo-Spatial Data Mining

preserves spatio-temporal features (edges, discontinuities) while providing good approximation of long-term trends in data

Efficient distributed implementation possible.

Page 9: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

9

Sample Architecture: Precipitation Hierarchy

Local Processing: Construct lossy time-series summary (zero communication cost)

Spatial Data Processing: Hierarchical Lossy Compression

Organize network into hierarchy. At each higher level, reduce number of participating nodes by a factor of 4.

At each step of the hierarchy, summarize data from 4 quadrants, and propagate

Wavele

t

Coeff

s

What is the maximum precipitation between Sept-Dec 2002?

Direct query to quadrant that

best matches query

decreasingtemporalresolution

decreasingspatial

resolution

Page 10: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

10

Spatial Decomposition Recursively split network into

non-overlapping square grids. At each level of the hierarchy,

Elect clusterhead Cluster-head combines and

summarizes data from 4 quadrants

Cluster-head propagates compressed data to the next level of the hierarchy.

Routing protocol: GPSR variant (DCS - Ratnasamy et al,)

Hierarchyconstruction

Page 11: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

11

Wavelet Compression Internals

Haar Filter Debauchies 9/7 filter

Cost Metric

Communication Budget Error bound

WaveletSubband

Decomposition

Thresholding+

Quantization+

Drop SubbandsFilter

LosslessEncoder

x

time

y

InputData

xy

CompressedOutput

time

Page 12: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

12

Initial Results with Precipitation Dataset: Communication Overhead 15x12 grid (50km edge) of precipitation data from

1949-1994, from Pacific Northwest†. Gridded before processing.

Handpicked choice of threshold, quantization intervals, subbands to drop. Huffman Encoder at output.

Very large compression ratio up the hierarchyLevel Raw data

size (Kb) - RMean data sent to n ext level (Kb) - M

Compression Ratio = R/M

1 262.5 5.6 46.6

2 984.4 3.8 257.2

3 3937.7 7.4 987

4 11813.2 2.5 2286.2†M. Widmann and C.Bretherton. 50 km resolution daily precipitation for the Pacific Northwest, 1949-94.

Page 13: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

13

Find maximum annual precipitation for each year.

Exact Answer for 89% of queries. Within 90% of answer for >95% of queries.

Queries require less than 3% of network.

Good performance on average with very low lookup overhead

Drill Down Query: Error in Max Annual Precipitation from 1949-1994

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

1950 1960 1970 1980 1990 2000

Year Queried

Fra

ctio

n E

rro

r1

- (D

rillD

ow

n A

nsw

er/R

eal

An

swer

)

Page 14: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

14

Locate boundary in annual precipitation between Low and High Precipitation Areas

Error Metric: Number of nodes greater than 1 pixel distance from drill-down boundary

Accuracy: Within 25% error for 93% of the queries (or within 13% error for 75% of the queries)

Less than 5% of the network queried.

Drill Down Edge Query: Number of Nodes Missed Per Year

0

0.2

0.4

0.6

0.8

1

1.2

1940 1950 1960 1970 1980 1990 2000

Year Queried

Fra

ctio

n o

f B

ou

nd

ary

no

des

mis

sed

Page 15: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

15

Open Issues Load Balancing and Robustness

Hierarchical Model vs Peer Model: lot of work in p2p systems…

Irregular Node Placement Use wavelet extensions for irregular node

placement. Computationally more expensive Gridify dataset with interpolation

Providing Query Guarantees Can we bound error in response obtained for a

drill-down query at a particular level of hierarchy?

Implementation on IPAQ/mote network

Page 16: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

16

Summary DIMENSIONS provides a holistic

data handling architecture for sensor networks that can Support a wide range of sensor-network

usage and query models (using drill-down querying of wavelet summaries)

Provide a gracefully degrading lossy storage model (by progressively ageing summaries)

Offer ability to tune energy expended for query performance. (tunable lossy compression)

Page 17: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

17

Different optimization metrics

Internet-based Peer-to Peer Systems

Geo-Spatial Data Mining

Web Caches

Streaming Media (MPEG-2)

Wireless Sensor Networks

Energy

Latency

Bandwidth

Approximate Results OK

Spatio-temporal Query Performance

Lookup Cost

Resource Constraints

Page 18: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

18

Other Examples: Packet Loss

Different example of dataset that exhibits spatial correlation

Throughput from one transmitter to proximate receivers is correlated

Throughput from multiple proximate transmitters to one receiver is correlated.

Typically, what we want to query is the deviations from normal and average throughput.

0 5 10 150

5

10

15

Distance (ft)D

ista

nce

(ft)

Contour Map

0.2

0.2

0.2

0.20.2

0.20.2

0.2

0.2

0.2

0.2

0.2

0.2

0.4

0.4

0.4

0.4

0.4

0.4

0.4

0.40.4

0.4

0.40.4

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.8

Page 19: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

19

Involves expensive transfer of 12x14 map from each node.

Good approximate results can be obtained from querying compressed data.

Packet-Loss Dataset: Get Throughput Vs Distance Map

Page 20: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

20

Long-term Storage: Concepts

Data is progressively aged, both locally, and along the hierarchy. Summaries that

cover larger areas and longer time-periods are retained for much longer than raw time-series.

Slo

wer

Ag

ein

g

Wavele

t C

oeffi

cients

Page 21: 1 DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks? Deepak Ganesan, Deborah Estrin (UCLA), John Heidemann (USC/ISI) Presenter:

21

Load Balancing and Robustness: Concepts Hierarchical Model

Naturally fits wavelet processing Strict hierarchies are vulnerable to node

failures. Failures near root of hierarchy can be expensive to repair

Decentralized Peer Model Summaries communicated to multiple

nodes probabilistically. Better robustness, but incurs greater

communication overhead.