Quasar Group Quality Aware Sensor Infrastructure (QUASAR) Project** Team Faculty: Sharad Mehrotra,...
-
date post
21-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of Quasar Group Quality Aware Sensor Infrastructure (QUASAR) Project** Team Faculty: Sharad Mehrotra,...
Quasar Group
Quality Aware Sensor Infrastructure (QUASAR) Project**
Team
Faculty: Sharad Mehrotra, Nalini Venkatasubramanian
Postdoc: Dimitr Kalashnikov
Students Qi Han, Iosif Lazaridis, Xingbo Yu
**Supported in part by a collaborative NSF ITR grant entitled “real-time data capture, analysis, and querying of dynamic spatio-temporal events” in collaboration with UCLA, U. Maryland, U. Chicago
Quasar Group
Ubiquitous Sensor Environments
Sensor Network
s
Battlefield MonitoringHabitat Monitoring
Earthquake Monitoring
Oceanographic current monitoring
Medical Condition Monitoring
Traffic Congestion Detection
Target Tracking & Detection Intrusion Detection
Video Surveillance
• Generational advances to computing infrastructure– sensors will be everywhere
• Continuous monitoring and recording of physical world and its phenomena– Limitless possibilities
• New challenges – limited bandwidth & energy – highly dynamic systems
• System architectures are due for an overhaul– at all levels of the system networks,
OS, middleware, databases, applications
Quasar Group
Taxonomy of Applications (1)
• Data Access needs of applications– Historical data
• Analysis to better understand the physical world
– Current data• Monitoring and control to optimize the processes that
drive the physical world
– Future data• Forecasting trend in data for decision making
Quasar Group
Taxonomy of Applications (2)
• Predictability of Data access– Fixed
• data access needs of applications known a-priori
– Unpredictable (ad-hoc)• Data access needs of applications not known at any
instance of time
– Predictable (continuous)• Data access needs of applications can be predicted for
some time in the future with high probability
Quasar Group
Application Landscape
no knowledge some knowledge full knowledge
Temporal property of data accessed
Predictability of data access
the present
the future
Each evening at 8pm predict the temperature for the next 5 days
Notify me immediately when there is a forest fire
Every month, calculate the average humidity in California for the last 30 days
Did the temperature rise above 40oC in the last year?
Is Mr. Doe’s newly proposed weather model accurate for 1996-2000?
How much snow is there in Aspen?
I’m going surfing on Sep. 30! Will it be windy?
Visualize current humidity with Mrs. Doe’s new interpolation scheme.
Predict noise levels around the airport if runway 2 becomes operational
the
past
Quasar Group
Sensor Data Management Infrastructure
• A data collection and management middleware infrastructure that– provides seamless access to data dispersed across a hierarchy of
sensors, servers, and archives
– supports multiple concurrent applications of diverse types
– adapts to changing application needs
• Fundamental Issues:– Where to store data?
• do not store, at the producers, at the servers
– Where to compute?• At the client, server, data producers
Quasar Group
Existing DBMS technologies…
• Traditional data management– client-server architecture– efficient approaches to data storage & querying – query shipping versus data shipping– data changes with explicit update
• Limitations– Sensors generate continuously changing data
• Producers must be considered as “first class” entities
– Does not exploit the storage, processing, and communicating capabilities of sensors
data/query request
data/query result clientserverdata producers
Quasar Group
Stream Data Management
• Data streams through the server but is not stored• Continuous queries evaluated against streaming data• Deals with problems due to dynamic data on the server side• But
– Does not converse sensor resources (e.g., power)– Does not exploit the storage and processing capabilities of sensors– Geared towards continuous monitoring and not archival applications
stream processingengine
(Approximate) Answer
synopsis in memory
data streams continuous queries
Quasar Group
Quasar Architecture
• Hierarchical architecture– data flows from producers to server to
clients periodically– queries flow the other way:
• If client cache does not suffices, then• query routed to appropriate server• If server cache does not suffice, then
access current data at producer
– This is a logical architecture• producers could also be clients• A server may be a base station or a
(more) powerful sensor node• Servers might themselves be
hierarchically organized• The hierarchy might evolve over time
server
clientclient cache
server cache and archive
Producer & its cacheQ
UE
RY
FL
OW
DA
TA
FL
OW
Quasar Group
Quasar: Observations & Approach
• Applications can tolerate errors in sensor data– applications may not require exact answers:
• small errors in location during tracking or error in answer to query result may be OK
– data cannot be precise due to measurement errors, transmission delays, etc.
• Communication is the dominant cost – limited wireless bandwidth, source of major energy drain
• Quasar Approach– exploit application error tolerance to reduce communication
between producer and server and/or to conserve energy– Two approaches
• Minimize resource usage given quality constraints • Maximize quality given resource constraints
Quasar Group
Quasar Issues …
• Mapping application quality requirement to data quality requirements– Examples:
• Target tracking: quality of track --> accuracy of data• Aggregation Queries: accuracy of results --> accuracy of data
– Strategy should adapt to expected application load
• Quality-based data collection – Minimize sensor resource consumption while guaranteeing required
data quality
• Quality-cognizant query processing– imprecise data representation– Optimal execution of queries over imprecise data
Quasar Group
Quasar Progress …
• Mapping application quality requirement to data quality requirements– Target Tracking using acoustic sensors [MW ‘03]– Spatial range queries [DEXA ‘03]
• Quality-based data collection – General framework [DS Online ‘03]– To support monitoring queries over current data [Qi+03]– For sensor data archival [ICDE ‘03]– With real-time constraints [RTSS ‘03]– With support for in-network aggregation [Yu+03]
• Quality-cognizant query processing– Aggregation queries [Sigmod ‘01]– Selection Queries [ICDE ‘04]
Quasar Group
Quality Aware Data Collection Problem
• Let P = < p[1], p[2], …, p[n] > be a sequence of environmental
measurements (time series) generated by the producer, where n = now
• Let S = <s[1], s[2], …, s[n]> be the server side representation of the
sequence
• A within- quality data collection protocol guarantees that
for all i error(p[i], s[i]) <
is derived from application quality tolerance
Sensor time series…p[n], p[n-1], …, p[1]
Quasar Group
Simple Data Collection Protocol
• sensor Logic (at time step n)
Let p’ = last value sent to server
if error(p[n], p’) > or on timeout
send p[n] to server --- sensor if switch radio on, if need be
• server logic (at time step n)
If new update p[n] received at step n
s[n] = p[n]
Else
s[n] = last update sent by sensor
– guarantees maximum error at server less than equal to
Sensor time series…p[n], p[n-1], …, p[1]
Quasar Group
Exploiting Prediction Models
• Producer and server agree upon a prediction model (M, )
• Let spred[i] be the predicted value at time i based on (M, )
• sensor Logic (at time step n)
if error(p[n], spred[n] ) >
send p[n] to server
• server logic (at time step n)
• If new update p[n] received at step n
s[n] = p[n]
Else
s[n] = spred[n] based on model (M, )
Quasar Group
Challenges in Prediction
• Simple versus complex models?• Complex and more accurate models require more parameters (that will need to
be transmitted).
• Goal is to minimize cost not necessarily best prediction
• How is a model M generated?• static -- one out of a fixed set of models
• dynamic -- dynamically learn a model from data
• When should a model M or parameters be changed?
• immediately on model violation:
– too aggressive: violation may be a temporary phenomena
• never changed:
– too conservative: data rarely follows a single model
Quasar Group
Challenges in Prediction (cont.)
• who updates the model?
• Server
– long-haul prediction models possible, since server maintains history
– might not predict recent behavior well since server does not know exact S
sequence; server has only samples
– extra communication to inform the producer
• Producer
– better knowledge of recent history
– long haul models not feasible since producer does not have history
– producers share computation load
• Both
– server looks for new models, sensor performs parameter fitting given
existing models.
Quasar Group
Answering Queries
• If query quality tolerance satisfied at server (more than )
– Answer query at the server
• Else
– Probe the sensor
– Sensor guaranteed to respond within a bounded time
• Approach guarantees quality tolerance of queries
Probe result
… sensor-initiated update(sensor time series: …p[n], p[n-1], …, p[1])
probe
query Q1
(A1)query Qm
(Am)
i=[li,ui]sensor si
Imprecise data representation
Quasar Group
The Challenge …
• How should sensor state be managed to minimize energy consumption in maintaining data at required quality– Sensor State: error precision, power states
• Power consumption of sensors
0.016offsleeping
12.36idlelistening
12.50Rxlistening
14.88Txactive
Power consumption (mW)Radio modeSensor state
Quasar Group
Sensor State Models
sleeping
listening
active
Upon first sensor-initiated updateOr after Ts
After Tl without traffic
Upon first sensor initiated update or probe
Ta after processing last sensor-initiated update or probe
Active-Listening-Sleeping Model (ALS):
Other Models: Always-Active (AA) [Ta is infinite]Active-Listening (AL) [Tl is infinite]Active-Sleeping (AS) [Tl is 0]
Quasar Group
Issues in Energy Efficient Data Collection
• Issues– How to maintain the precision range for each sensor
• Larger increases possibility of expensive probes• Small wastes communication due to sensor-initiated updates
– When to transition between sensor states (I.e, set Ta, Tl, Ts)
• Powering down might not be optimal if we have to power up immediately
• Powering down may increases query response time
• Objective – set values for Ta, Tl, Ts, and that minimizes energy cost
normalized energy cost= energy consumed at each state + state transition energy
Quasar Group
Our Approaches to Energy Efficient Sensor Data Collection
• We solve the energy optimization problem by solving two sub-problems– Optimize energy consumption by adjusting range
size under the assumption that the state transition is fixed
• I.e., Ta, Tl, and Ts have been optimally set
– Optimize energy consumption by adapting sensor states while assuming that the precision range for sensor is fixed
Quasar Group
Range size Adjustment for the AA/AL Model
• Optimal precision range that minimizes E occurs when
– Optimal range can be realized by maintaining this probability ratio – Can be done at the sensor
• Assuming that is the ratio of sensor-initiated update probability to probe probability:
for sensor-initiated update:
with probability min{,1}, set ’= (1+);
for probe:
with probability min{1/ ,1}, set ’=/(1+ );
Quasar Group
Range Size Adjustment for the AS/ALS Model
• Sensor side– Keep track of the number of state transitions of the last k updates
– Piggyback the probability of state transitions with the Kth update
• Server side– Keep track of the number of sensor-initiated updates and probes of
the last k updates
– Upon receiving the Kth update from the sensor• Compute the optimal precision range • Inform the sensor about the new
Quasar Group
Server-side Algorithm:
while (1) {
if (received an update) {
i ++;
if (source-initiated update) Nsu ++;
if (consumer-initiated update) Ncu ++;
if (i == k) {
Psu = Nsu/T; Pcu = Ncu/T;
compute K1 and K2 for current
sliding window;
compute r;
send to sensor: r;
}
}
}
Sensor-side Algorithm:
while (1) {
if (transition from active to sleeping) Nas ++;
if (transition from sleeping to active) Nsa ++;
if (received an update) {
i++;
l’ = e’ – r/2; u’= e’ + r/2;
send to server: (l’,u’);
if (i == k) {
Pas = Nas /T; Psa = Nsa /T;
compute and for current window;
send to server: ( , ); i=0;
}
}
}
Adaptive Range Setting for the AS/ALS Model
Quasar Group
Adaptive State Management
• Consider the AS model for derivation of optimal Ta to minimize energy consumption– Assuming (t) is the probability of receiving a request at time
instant t, the expected energy consumption for a single silent period is
– E is minimized when Ta=0 if requests are uniformly distributed in interval [0, Ta+Ts].
• In practice, learn (t) at runtime and select Ta adaptively– Choose a window size w in advance– Keep track of the last w silent period lengths and summarizes
this information in a histogram– Periodically use the histogram to generate a new Ta
Quasar Group
Adaptive State Management (Cont)
• ci : the number of silent periods for bin i among the last w silent periods
• estimate by the distribution which generates a silent period of length ti with probability ci/w
• Ta is chosen to be the value tm that minimizes the energy consumption as follows:
bin 0bin 1
bin 2bin n-1
t0 t1 t2 t3…… tn-1 tn=Ta+Ts
c0
c1
c2
cn-1
Quasar Group
Performance Study
• Simulation Environments• Modeling sensor
– Power consumption parameters: Berkeley motes – Sensor values:
• uniformly from the range [-150, 150]; • perform a random walk in one dimension: every second, the
values either increases or decreases by an amount sampled uniformly from [0.5,1.5].
• Modeling queries– query arrival times at the server are Poisson distributed
• mean inter-arrival time = 2 seconds.
– each query is accompanied by an accuracy constraint A• A=uniform( Aavg(1- Avar ), Aavg(1+ Avar ))• Aavg =20 (average accuracy constraint) • Avar=1 (accuracy constraint variation)
Quasar Group
System Performance Comparison
Query Response Time Comparison
0
100
200
300
400
500
600
700
800
AA AL AS ALSaverage query respone time (us)
Sensor Energy Consumption Comparison
0
2
4
6
8
10
12
14
16
AA AL AS ALS
normalized sensor energy
consumption(uJ)
Quasar Group
Impact of Ta adaptation on System Performance
Impact of Ta Selection on Query Response Time
700
720
740
760
780
800
820
840
static Ta(0) adaptive Taaverage query response time(us)
Impact of Ta Selection on Sensor Energy Consumption
0
1
2
3
4
5
6
7
8
9
static Ta(0) adaptive Ta
normalized sensor energy
consumption(uJ)
Quasar Group
Impact of Range Size Adaptation on System Performance
Impact of Range Size Adjustment on Query Response Time
0
500
1000
1500
2000
2500
fixed(0) average accuracyconstraint
adaptiveadjustment
fixed(large)
average query response time (ms)
Impact of Range Size Adjustment on Sensor Energy Consumption
0
0.01
0.02
0.03
0.04
0.05
fixed(0) average accuracyconstraint
adaptiveadjustment
fixed(large)
normalized sensor
energy consumption(uJ)
Quasar Group
Fusing Energy Efficient Data Collection and In-network Aggregation
• Issues– Hierarchical precision range adjustment– Cluster forming and dynamic maintenance
access point access point……
……
Quasar Group
Quasar Progress …
• Mapping application quality requirement to data quality requirements– Target Tracking using acoustic sensors [MW ‘03]– Spatial range queries [DEXA ‘03]
• Quality-based data collection – General framework [DS Online ‘03]– To support monitoring queries over current data [Qi+03]– For sensor data archival [ICDE ‘03]– With real-time constraints [RTSS ‘03]– With support for in-network aggregation [Yu+03]
• Quality-cognizant query processing– Aggregation queries [Sigmod ‘01]– Selection Queries [ICDE ‘04]
Quasar Group
Problem Definition
• There is a collection T of imprecise objects– E.g., { [1,3], [2,5], [4,9] } represents {2, 3, 5}
• The query is: “Retrieve objects from T which satisfy predicate ”
– The query specifies quality requirements
– The system must return some approximate result that meets the quality requirements and with minimum overall cost.
Quasar Group
Impact of Data Imprecision
• Objects are classified as:– a is a NO object– b, f are MAYBE objects– c, d, e are YES objects
• The exact set is E = { b,
c, d,
e}
Imprecise Object o
Precise Object o can
be retrieved with a probe
Selection
a b c d e f
Quasar Group
Defining Quality
• Measures the accuracy of an Approximate answer A• Set-based Quality
– Precision: p = |A E | / | A |– Recall: r = | A E | / | E |
• Value-based Quality– Laxity of an object is l (o ). E.g., l ([2,3]) = 3-2=1
– Laxity of A is l max = max xA l (x)
• Query specifies upper bounds pq, rq, lmaxq
Selection
a b c d e f
Quasar Group
Evaluating QaQ Selection Operator
Read Object through either a linear scan (currently assumed) or an index scan [SIGMOD ‘01]
YESNO
MAYBE
• Probe
• Forward
• Ignore• Probe
• Forward
• Ignore
•Another possibility is to store the object and deal with it later
•Might be good under certain situations based on available memory at the server
Quasar Group
Total
Yes No Maybe M = Ms Mns
Seen Not Seen
MnsMsNY
TIn the beginning:
At some point of operator evaluation:
Answer set A contains some seen YES and MAYBE:
Objects are classified as:
Y A Ms A
State of QaQ Selection in the middle of execution
= A
Quasar Group
Answer Quality Bounds
• guaranteed precision, guaranteed recall, and guaranteed laxity at any stage of the execution
– Precision: p p G =|Y A | / |A|
– Recall: r r G = | Y A| / (|Y |+|Mns|+|Ms-A|)
– Laxity: lmax = max xA l (x)
Y APrecision
Y A Ms A
RecallY A
MnsMs-AY A
+
+
Quasar Group
The Decision Problem
• How should the QaQ selection operator decide – When to probe– When to forward– When to ignore
• Objective:– Meet query quality requirement – Minimize cost
Quasar Group
Cost Model: Combined Data Access & Probe Cost
CostRead Probe Write
Quasar Group
Impact of Probe, Forward, Ignore actions to quality
• + increase, - decrease, = remains the same
Quasar Group
Constraints on the Decision
• Some decisions are fixed -- we have no choice!
• No objects with l(o) greater than the query tolerance lqmax must be forwarded
• The precision guarantee pG must never be less than the query tolerance pq
– If no new YES objects are seen might lead to pq violation
• If |A Y | / (|Y |+|Ms-A|) is less than the query tolerance rq you can’t ignore an object– This might lead to an rq violation if no new YES objects are seen
Quasar Group
The “decision” Plane
s(o): probability of a MAYBE object satisfying the selection
Laxit
y l(o
)
s(o)=0 0<s(o)<1 s(o)=1
1
lqmax
2 3
4 5
6
7s3
s5
Forward with probability pfm
or ignore
Ignore Probe
Probe
Probe with probability ppy
or ignore
Forward
No Maybe Yes
Quasar Group
The Optimization Problem
• Free parameters ppy, s3, s5 , pfm
• Estimate:– Number of YES/MAYBE/NO objects– Number of YES/MAYBE objects exceeding the
lqmax threshold
– Distribution of s (o )
• Minimize cost W in parameter space (ppy, s3 ,
s5 , pfm) subject to Precision, Recall, Laxity guarantees
Quasar Group
How it works
• Get selectivity estimates of the input set T • Solve the 4-parameter optimization problem
and obtain optimal values for ppy, s3 , s5 , pfm
• Read one object at a time and handle it according to the “decision plane”, instantiated with ppy, s3 , s5 , pfm
• Finish when quality requirements are met
Quasar Group
Performance Study
• Size of input |T | = 10,000• Laxity ranges in [0,100]• Probe cost = 100x read/write unit cost.• We vary:
– Precision, Recall, Laxity Requirement– Query selectivity– Input Uncertainty (ratio of YES/MAYBE objects)
• Costs are normalized by dividing with |T |
Quasar Group
Competing Algorithms
• We devised two simple heuristics:– STINGY avoids probes: it ignores MAYBE objects and
objects exceeding the lqmax threshold.
• STINGY is conservative, but sometimes it is forced to probe to meet the quality guarantees.
– GREEDY forwards all MAYBE objects and probes all objects that exceed the lqmax threshold.
• GREEDY tries to produce the result quickly by not ignoring objects, but sometimes it uses too many probes and forwards too many objects
Quasar Group
Varying Laxity
• Input has 20% YES, 20% MAYBE objects
• 90% Precision and 50% Recall is requested
• As the laxity requirement becomes looser, the cost is reduced since imprecise objects can be forwarded without a probe
Quasar Group
Varying Precision
• Input has 20% YES, 20% MAYBE objects
• 50% Recall and laxity=50 is requested
• Cost increases as Precision requirement increases, as objects can’t be forwarded unprobed
Quasar Group
Varying Recall
• Input has 20% YES, 20% MAYBE objects
• 90% Precision and laxity=50 is requested
• Cost increases as Recall requirement increases
• When Recall requirement is low, only part of the input needs to be read
• As Recall requirement tends to 100%, all the input must be read and no objects can be ignored
Quasar Group
Varying Selectivity
• Input has 20% YES, 20% MAYBE objects
• 90% Precision, 50% Recall, and laxity=50 is requested
• Cost increases as selectivity increases, since more objects need to be output
Quasar Group
Varying Input Uncertainty
• Input has 20% YES, 20% MAYBE objects
• 90% Precision, 50% Recall, and laxity=50 is requested
• When MAYBE objects are few, no probe cost needs to be paid: the few MAYBE objects can be ignored
• When MAYBE objects are many, they cannot be ignored (Recall might be violated), or forwarded (Precision violated). Hence, they are probed, increasing the cost
Quasar Group
Quasar Future Work
• Mapping application quality to data quality– Other notions of quality (probabilistic, spatial and temporal
resolution)
• Quality aware data collection– Incorporating new notions of quality– Fault tolerance– Co-optimizing data collection and network routing
• Quality aware query processing – More general class of SQL queries
Quasar Group
Plug-in (for UCI-ICS)….
• Newly established school with following departments– Computer Science, Informatics, Statistics
• 50 plus faculty currently, many open positions• Many new developments
– Lot of new funding– Cal-IT2 building well under way – New ICS building just getting started
• Endowed chair search – Approx. 2 million from anonymous donor
Quasar Group
Database Research @ UCI
• Folks: C. Li, S. Mehrotra, P. Smyth, G. Tsudik, N. Venkatasubramanian• Core Technologies
– Indexing, query processing, transactions, distributed systems, grids
• Service Model– Exploring the privacy, performance, and algorithmic challenges in providing databases as an internet service
• Customizable search and data analysis– Customizing/personalizing search, flexible similarity retrieval over structured, semi-structured and
unstructured data
• Event-entity data management– Extracting, representing, querying and analyzing a web of entities and events from multimodal data
• Data Cleansing– Entity resolution, event resolution
• Information Integration– Middleware for querying multiple heterogeneous databases
• Peer to Peer Systems– Search, resource discovery, data integration
• Data Mining– Discovering pattern/user models from digital traces, customizing systems based on models
• Sensor Databases– Management of data in highly dynamic, resource constrained environments
Quasar Group
Urban Crisis Response Center@ Cal-IT2
• NSF Large Information Technology Research Grant– $12.5 Million Over Five Years
• Research collaboration led by UCI and UCSD– UCI: Sharad Mehrotra (PI) Co-PIs: C. Butts, N. Venkatasubramanian, R. Eguchi
(ImageCat), M. Winslettt (Univ. of Illinois) – UCSD: Ramesh Rao (PI), Co-PIs: B. Rao, M. Trivedi
• Research Team
• Government Partners: – City of LA, County of LA, City of Irvine, City of San Diego, State of California
www.ucrec.net -- coming soon!
Decision support, remote sensing, transportation, damage simulation Eguchi, Kehrlein, Huyck, Cho, AdamsImageCat, Inc.
Use of sensors to detect damage to critical infrastructureChangUniv. of Maryland
Trust managementKentBrigham Young Univ.
Social and organizational behavior in disastersTierneyUniv. of Colorado
Trust managementWinslettUniv. of Illinois
Mobile computing, wireless technologiesB. Rao, R. RaoUCSD, Center for Wireless Comm.
Speech Recognition, Video Processing, NetworkingM. Trivedi, R. Rao, B. RaoUCSD, Elec. & Comp. Eng.
Notification services, Wireless System development and deploymentChokalingam, JaffarianUCSD, Cal-(IT)2
Social networking and organizational behaviorButtsUCI, Sociology
Simulation, transportation, traffic managementReckerUCI, Inst. For Trans. Studies
Damage assessment, optimization, sensorsShinozuka, FengUCI, Civil & Env. Eng.
Data management, mining, machine learning, distributed systems, networking, security
Li, Mark, Mehrotra, Smyth, Tsudik, Venkatasubramanian
UCI, Info. & Comp. Sci.
Research StrengthsResearchersOrganization
Now hiring postdocs, researchers, programmers, students
Quasar Group
Multidisciplinary Research Agenda of UCREC
• Information Technology– right information– right person– right time
• Social and Organizational Science– The right context
• the distinctive nature of dynamic virtual organizations
• their information needs• the social and cultural aspects
of information sharing across organizations and individuals