Query optimization for_sensor_networks

of 29 /29
Query Optimization for Sensor Networks Harshavardhan Achrekar University of Massachusetts-Lowell

Embed Size (px)


Database Presentation

Transcript of Query optimization for_sensor_networks

  • 1.Query Optimization for Sensor Networks Harshavardhan Achrekar University of Massachusetts-Lowell

2. Basic architecture for Querying in TinyDB

  • Query submitted at a PC (base station), parsed, optimized
  • Query sent into the sensor network, disseminated, processed
  • Result flows back up the routing tree that was formed as the query propagated

3. Disadvantages of this architecture

  • Data is extracted from sensor network in a predefined way and is stored in a database located on front-end.
  • Query processing takes place on centralized database & O/P results of predefined queries over historical data.
  • Nodes near access point become traffic hot spots, central points of failure , may be depleted of energy prematurely
  • Does not take advantage of in-network aggregation of data to reduce communication load, when only aggregate data needs to be reported

4. Goal of this Research Proposal

  • Design a scheme to support multiple data acquisition and aggregation queries in a wireless sensor network, in order to minimize the amount of radio activity and energy consumption.
  • Co-relation among similar queries to share the limited communication and computational resources.
  • Devise a final optimal query plan by applying successive transformations rules to initial query plan.

5. Example: Flood Warning System

  • A user from an emergency management agency sends a query to the flood sensor DB: For the next 3 hours, retrieve every 10 minutes the maximum rainfall level in each county in Southern California, if it is greater than 3.0 inches
  • Select max( rainfall_level), county from Sensors
  • where state = 'Southern Californiagroup by county
  • having max( rainfall_level ) > 3.0
  • in duration [now, now + 180 min]
  • sampling period 10 min

6. Classification of Queries

  • Long-running, continuous queries: report results over an extended time window. ex: for the next 3 hours, retrieve every 10 minutes the rainfall level in California
  • Snapshot queries: data in the network at a given point in time. ex: retrieve the current rainfall level for all sensors in California
  • Historical queries: aggregate information over historical data. ex: retrieve the average rainfall level at all sensors for the last 3 months of the previous year

7. Optimization of a Long Continuous Query

  • ( S I 1 ,S I 2)join operatorthat relates tuples having the same timestampTS. For every new tupleread on one of the input streams the join operator checks if the last tuple read from the other stream has the same timestamp.
  • (S I 1 , S I 2 ), sync-join,whereS I 2 is an on-demandstream. The sync-join requests the activation ofS I 2only when a tuple arrives onS I 1.

8. Transformation rules

  • Use Sync-join & on-demand streams when possible.
  • Given that a sync-join requires a sensor stream on the right side, trees representing query plans should be unbalanced to the left (Left Deep Join Trees)
  • Unary operators such as selections, projections, and temporal aggregates (which reduce the amount of data being forwarded) should be moved as close as possible to the node where data is acquired.

9. Query optimization example:

  • SELECT * FROM 1.Magnetism, 2.Acceleration, 3.Temperature WHERE p1(1.Magnetism) and p2(2.Acceleration) and p3(3.Temperature) EVERY 1000
  • where p1, p2, and p3 are some predicates on magnetism, acceleration and temperature readings, respectively, with probability Pr(p1) = 0.01, Pr(p2) = 0.05, Pr(p3) = 0.1

10. Analysis of Cost of execution QP1 is obtained by applying the left deep join trees rule. QP2 is obtained from QP1 by using the selections push-down rule and their allocation on the node where data are generatedQP3 is obtained from QP2 by using rules for transforming joins into sync-joins. 11. Two-Tier Multiple Query Optimization

  • This Scheme proposes to supports both aggregation and data acquisition queries whileminimize the average transmission time in sensor network.
  • Tier One :- Base Station Optimization Algorithm (a cost-based approach to heuristically rewrite user queries into synthetic queries before injecting them into the sensor network)
  • Tier Two :-In network Optimization Algorithm (Sensor nodes make local decisions themselves and adaptively handle the query workload with time)

12. Base Station Optimization Algorithm

  • User query Structure
      • (a) qid - unique identifier of the query.
      • (b) attribute list-list of attributes that data acquisition query qid retrieves from the sensor network
      • (c) Agg_list is a list of that an aggregation query qid acquires.
      • (d) predicate list - is a list of
      • (e) qid' field -to denote which synthetic query this query qid has been rewritten into.
  • Synthetic Query Structure
  • (a) count field is associated with the epoch duration field as well as each entry in the various
  • lists (attribute list, agg list and predicate list), which denotes the number of user queries that require that piece of data. This is to facilitate the maintenance of the synthetic query when user queries terminate.
  • (b) A from list field contains the user queries which the synthetic query is responsible for.
  • (c) A flag field denotes the current status of this synthetic query.
  • (d) A benefit field indicates the benefit that can be gained by the synthetic query (in comparisonto processing the individual user queries).

13. Benefit Estimation-Cost Model

  • Transmission cost of a result message from one node to another can be estimated as C start + C trans len(q i ).
  • To measure the average transmission cost incurred by qi for each unit of time, we have to estimate the number of per-unit time transmissions incurred by qi, which is related to the number of result messages generated by the sensors as well as the number of hops required to forward the messages back to the base station.


  • First, we look at the per-unit time number of result messages generated by a set of sensor nodes N k , which is denoted as result(q i, ,N k ). At the end of each epoch of q i, , one result message would be generated by a sensor node whose readings satisfy the predicates of q i . Therefore, we have
  • result(q i, N k ) = (sel(q i, N k ) |N k | )/epoch i(1)
  • where sel(q i, N k ) is the selectivity of the query predicates over N k , which is equal to the percentage of sensor nodes in N kwhose readings can satisfy the query predicates, epoch iis the epoch length of qi.

Benefit Estimation-Cost Model 15.

  • Second, the forwarding hops of the result messages are determined by the message source nodes location at the data routing tree. Based on Eq. (1), the number of message transmission incurred by qi is
  • trans(q i ) = k=1 to max_depth {result(q i ,N k ) k }(2)
  • where N kis the set of sensor nodes at the kth level of the routing tree and max depth is the maximum depth of the routing tree.
  • thecomputational cost of a query cost(q i )
  • cost(q i ) = trans(q i ) (Cstart + Ctrans len(q i )) (3)
  • Benefit(q 1 , q 2 ) = cost(q 1 ) + cost(q 2 ) cost(q 12 ).

Benefit Estimation-Cost Model 16. Base Station Optimization Algorithm 17. Base Station Optimization Algorithm 18. In-Network Optimization Algorithm

  • Sharing over time -more progressive sharing over time by scheduling data acquisition and transmission of all queries in a whole.
  • At the end of a querys propagation phase,setSampleRateis triggered, which may start (or restart) the nodes clock to fire at the GCD of the epoch duration of all the queries. We set the epoch start time on sensor nodes to be divisible by the epoch duration instead of the arrival time of a new query (here we assume that every epoch duration is divisible by 2048ms).


  • Sharing over space -After the sample rate has been set at each node, data will be retrieved periodically and transmitted out of the network to the base station. During the query result collection, we use the optimization heuristics to aggressively share data over space.
  • Each sensor node dynamically selects a route (parent) that is aware of the query space (except tinydb network with uses link quality); in the meanwhile, it tries to take advantage of the broadcast nature of the radio channel to satisfy multiple queries in one message.

In-Network Optimization Algorithm 20. Query Propagation Phase

  • Queries are flooded throughout the network from the base station.Accurate set of sensors that have data for the query are not known a prior to the base station & the set of sensor nodes can vary with time.
  • Let every sensor decide where to propagate to based on its local information about neighbors.
  • When query is propagated from node x at level i to level i + 1, node x checks if it has the data the query retrieves, and piggybacks this information down.


  • In the meanwhile, the DAG is formed by having an edge from every node to each of its upper level neighbors (If the network is dense and not all neighbors be maintained, but neighbors that also have query result to transmit).
  • If the data at node x does not satisfy any query, x switches into sleep mode and will wake up after a predefined time.
  • When it wakes up, if it finds that its current data satisfies a query, it sends a one-hop broadcast message so that its lower level neighbors would consider the node as an option to relay its data.

Query Propagation Phase 22. Result Collection Phase

  • Epoch-based mechanism: each epoch (sampling period) is divided into time intervals. Nb. of intervals reflects the depth of the routing tree.
  • Aggregation results reported at the end of each sampling period
  • When a node broadcasts a query, it specifies the time interval within which it expects to hear the result from its children.
  • During its scheduled interval, each node:


  • listens for the packets from the children, receives them (gray)
  • computes a new partial state record by combining its own data and the partial state records from its children (black)
  • sends the result up the tree to its parent (white)

Result Collection Phase 24. Example to explain DAG In Network Algo

  • 8 nodes involved and 12 (messages for q j)+8 (messages for q j) =20 radio messages are transmitted.
  • Using DAG, G will choose D instead of C to relay for both q iand q j, and hence node C and A can be instructed to sleep.
  • 6 nodes involved and 4 (messages for q j)+8 (messages for q i) =12 radio messages are transmitted.

D,E,F,G,H are queried by query q i D,G,H by q j 25. My Proposal for Multi Query Optimization

  • Suppose for a system with n queries we choose n distinct root stations (Assuming no. of sensor nodes>no of concurrent queries)
  • Queries are flooded throughout the network from each of the root node connected directly to base station.
  • We divide the epoch duration in n equal intervals and every processing node is continuously transmitting sensed data relevant to each query in every interval of the epoch.

26. My Proposal for Multi Query Optimization

  • The twist in the Algorithm lies in processing the next query in a scheduled round-robbin fashion when a node is in a sleeping mode as per previous discussion.
  • At the end of the epoch we output result of all concurrent queries simultaneously.
  • Apply co-relation Algorithm- reduce amount of transmitted data.
  • Problem - Overloaded node- a single node acts as a parent in the same epoch time for 2 different queries. Under normal circumstances collision occurs.
  • Solution :- apply exponential back-off algorithm on contention queries.

27. Data Flow View

  • Query1: Select light sample period 2 ms
  • Query2: Select temp sample period 2ms
  • Query3: Select Humidity sample period 2ms
  • Query4: Select pressure sample period 2ms

Child for Query 1 Is Root for Query4 Parent of all leaf childs 28. Experimental Evaluations

  • No of Concurrent Co-related queries v/s avg. transmission time
  • For a fixed no. of multiple queries I would study relation of
    • Average transmission Time to no. of nodes.
    • Benefit ratio v/s Avg. no of synthetic queries
    • Communication cost v/s computation cost

29. Conclusion

  • After studying the current technology in optimizing sensor network query I proposed an architecture which can be the future of sensor networks.
  • Thank you
  • Questions ?????