Post on 03-Feb-2022
Overview of Data Management inSensor Networks
© Dr. Deepak Ganesan, edited by Dr. Robert Akl
Deepak Ganesan (UMass)
Data Management Basics
Sensor networks are data-centric Significant amount of data is being generated
within the network
Data management: How to you manage(store/process) the data in the network
Different data management approachesdepending on: Sensor: Data rate or Event rate Resource: Local storage, processing, bandwidth
and power capacity. Query: Type, arrival rate, complexity, latency
requirement
Deepak Ganesan (UMass)
Key Challenges in Data Management
Where should thedata be stored?
How shouldqueries be routedto the stored data?
Where and howshould aggregationbe performed?
How shouldqueries for sensornetworks beexpressed?
Info
rmat
ion
flow
Com
man
d flo
w
Deepak Ganesan (UMass)
Data Management Challenges Where should data be stored and query processing be
performed? Inside: dealing with storage limitations, query processing
overhead, distributed query processing. Outside: Dealing with bandwidth, scheduling, reliability
issues, power How should queries be routed to data?
Inside: flooding, geographic routing, gradient-basedrouting
Outside: Tree-based routing Where and how should aggregation be performed?
Opportunistically along routing path. Cluster-based, Gossip-based
How should queries on sensor data be expressed? Declarative querying for users Macroprogramming for developers
Where should data be stored?
Deepak Ganesan (UMass)
Spectrum of Data Storage and ProcessingLo
cal S
tora
ge
Communication for Data Storage
Local Storage andHierarchical Index
Local Storage andFlooding or Geography-based Query Processing
Multi-resolution Storageand indexing
Centralized Storage and Querying
Deepak Ganesan (UMass)
Spectrum of Data Storage and ProcessingC
om
mu
nic
ati
on
fo
r D
ata
Sto
rag
e
Communication for Query Processing
Local Storage andHierarchical Index
Multi-resolution Storageand indexing
Centralized Storage and Querying
Local Storage andFlooding or Geography-based Query Processing
Deepak Ganesan (UMass)
Centralized Storage and Querying
Method: Archive nothing locally, transmiteverything of interest When data item of interest is detected, send
all useful information to the base-station Advantages:
Persistent Centralized Storage. Intelligence is at more resource-rich node.
Complicated signal processing can be easilydone outside the network. Sensor nodesperform very simple filtering of data.
Disadvantages: Power Inefficient. Not applicable to
applications where large amount of data ispotentially useful.
Query, Trigger
When is centralized storage and querying appropriate? First Generation Data Collection/Acquisition Systems
James reserve, Great Duck Island, Structural Monitoring (Wisden)…etc Scientific applications where users need all the data.
Deepak Ganesan (UMass)
Multi-resolution Storage and Indexing Method: Store data in a multi-resolution
hierarchy Raw data at leaves, processed summaries of data
at clusterheads (may be higher power nodes)
Advantages: Root has a multi-resolution view of the data in the
network. This can be used to make intelligentdecisions about what nodes to query and toperform complicated processing
Data is replicated at multiple devices Even if raw data is phased out, summaries can be
stored.
Disadvantages: Processing and hierarchical storage requires power,
although not as much as centralized storage.
When is distributed storage and indexing appropriate? Second Generation Data Collection/Acquisition Systems Scientific applications where data sizes are large, and users need to
find patterns in sensor data.
Deepak Ganesan (UMass)
Local Storage and Distributed Indexing Method: Store data locally at each
node, construct distributed indexstructures to make search efficient
Advantages: Makes search efficient and requires
low communication overhead. Disadvantages:
Data is lost if node fails. Index structures can only deal with
specific attribute-based search, andnot with arbitrary signal processingfunctions over data.
When is local storage and distributed indexing appropriate? When search can be effectively scoped using simple attributes. For
example, if temperature is a good indicator of some other activity,this can be used to limit scope of search.
1<= Event Attribute <= 8
4<= Event Attribute <= 8
7<= Event Attribute <= 8
8<= Event Attribute <= 8
Deepak Ganesan (UMass)
Local Storage and Distributed Querying
Method: Store data locally at eachnode, query is flooded out to thenetwork or geographically routed.Query processing is performed on-demand.
Advantages: Only on-demand processing,
therefore energy efficient. Disadvantages:
Data is lost if node fails. Puts significant complexity into a
network of very low-power devices. Frequent queries incur high overhead
When is local storage and distributed querying appropriate? When queries are simple and have limited scope When schemes can deal with node failure.
How should queries be routedto stored data?
Deepak Ganesan (UMass)
Data-centric routing techniques
Push-based query routing
Pull-
bas
ed q
uer
y ro
uting
Tree-based routing
Query Flooding orGeographic Routing
Gradient-based routing
Deepak Ganesan (UMass)
Flooding queries into the network Flood the query throughout the network.
Nodes with matching attributes/parametersrespond to the query.
Pros: Very simple and reliable
Cons: Inefficient if frequent queries are posed or large-
scale network.
When is it useful? A large fraction of current deployment, and
possibly future deployments will be flooding-based just because of the inherent simplicityand reliability.
Deepak Ganesan (UMass)
Geographic routing to known locations
If query explicitly specifies location,selectively route the query to particularlocations of interest. Eg: “Find the average temperature in west
corridor of CS Building”
Pros: Can reduce query routing overhead by
selectively choosing nodes.
Cons: Complex routing strategy. Needs special
mechanisms to route around communicationholes. Lack of redundancy might result in querybeing lost.
Deepak Ganesan (UMass)
Gradient-based routing Setup gradients in the network that can
assist the queries to lead them towards theareas of interest. Also called publish-subscribe schemes.
Pros: Resilient to failures, packet-loss (similar to
gossip-based schemes) Not restricted to location-based queries. Can be
used for any spatially correlated attribute.
Cons: Incurs more overhead than geo-routing
schemes.
Deepak Ganesan (UMass)
Tree-based routing In push-based systems, the query can
remain at the base-station and the datacan be routed to it
Pros: Query process can be complex. Decisions can be
made at the intelligent node rather than theresource-constrained one.
Periodic push is synchronous, and can beoptimized through better scheduling policies.
Cons: Pure push is rather inefficient since decision
making is solely at the central location. Usually acombination of push/pull is more appropriate.
Where and how should queryresults be aggregated?
Deepak Ganesan (UMass)
General aggregationGeneral aggregation Let Hk be the information from k sources. It generally
satisfies the following conditions: It is non-decreasing with respect to k It is concave with respect to k
Uncorrelated sources Hk = k
Correlated sources: Hk = 1
Intermediate correlation
Number of sources
AggregateInformation
Deepak Ganesan (UMass)
Aggregation of query results
Opportunistic Aggregation Build Shortest Path Trees. Aggregate at
the junction nodes.
Cluster close to source of data Force query results to be aggregated
close to the data.
Query-optimized trees Build trees that are optimized for
particular kinds of queries.
Query processing language andoptimization
Deepak Ganesan (UMass)
Query Processing Challenges
Intended Audience: Users who pose queries Application developers
How much complexity to expose? Complex inter-resource constraints Distributed computation Data fusion/collaborative signal
processing
How much run-time vs compile-timequery optimization?
Deepak Ganesan (UMass)
Query Processing Language for Users Expressing spatio-temporal queries
When and where did event occur? Scoping the spatial or temporal region of interest.
Addressing individual sensors Able to specify what sensors and what sensor parameters you are
interested in. Confidence intervals or other measures of error tolerance.
Addressing Events Able to specify “events” of interest transparently from the event
processing. Confidence interval, error tolerance Hide distributed nature of computation for naïve users
Specify query processing constraints Latency of result
Hide distributed nature of computation for naïve users. Enable extensivequery expression of sensor data, events and query constraints.
Deepak Ganesan (UMass)
Programming Language for Developers Addressing groups of sensors and data fusion.
Combine data from motion detector, vibration sensor and camera(that may not be co-located) into a “detection event”.
Aggregation: Data type ‘vibration signal’ can be combined bylooking at the fft and picking the 4 dominant frequencies.
Specify the routing structure Cluster area into groups of nodes that observe correlated events.
Allow user-defined signal processing definition Each application has different aggregation needs.
Express resource constraints Energy: Do not expend more than J joules in trying to get the
result
Expose distributed nature of computation but providecomposable library of primitives for easier development.
Deepak Ganesan (UMass)
Runtime query optimization Energy constraints pose difficult query optimization
requirements Every sensor sample incurs energy with different
sensors incurring different overhead Processing and storage consume power as well.
Consider the query: Sample vibration andmagnetometer and report if vibration > Threshold1and magnetic flux > Th2. Vibration sampling requires lower energy than
magnetometer sampling, hence it should be donefirst.
Ordering of sampling, processing, communicationcan matter for energy reasons. How to performruntime query optimization?