Adaptive Stream Filters for Entity-based Queries with Non-value Tolerance VLDB 2005 Reynold Cheng...

31
Adaptive Stream Filters for Entity-based Adaptive Stream Filters for Entity-based Queries with Non-value Tolerance Queries with Non-value Tolerance VLDB 2005 VLDB 2005 Reynold Cheng Reynold Cheng (Speaker) (Speaker) Ben Kao, Ben Kao, Alan Kwan Alan Kwan Sunil Sunil Prabhakar, Prabhakar, Yicheng Tu Yicheng Tu The Hong Kong The Hong Kong Polytechnic Polytechnic University University The University of The University of Hong Kong Hong Kong Purdue University Purdue University

Transcript of Adaptive Stream Filters for Entity-based Queries with Non-value Tolerance VLDB 2005 Reynold Cheng...

Adaptive Stream Filters for Entity-based Adaptive Stream Filters for Entity-based Queries with Non-value ToleranceQueries with Non-value Tolerance

VLDB 2005VLDB 2005

Reynold Cheng Reynold Cheng (Speaker)(Speaker)

Ben Kao, Ben Kao,

Alan KwanAlan KwanSunil Prabhakar, Sunil Prabhakar,

Yicheng TuYicheng TuThe Hong Kong The Hong Kong

Polytechnic Polytechnic UniversityUniversity

The University of The University of

Hong KongHong KongPurdue UniversityPurdue University

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

22

Data Streams and Data Streams and ApplicationsApplications Data Stream Management Systems Data Stream Management Systems

(DSMS)(DSMS)– Sensor networks, location-based applicationsSensor networks, location-based applications– STREAMSTREAM [ABB03], [ABB03], STEAMSTEAM [HAFME03], [HAFME03],

AURORAAURORA [ACC03], [ACC03], CACQCACQ [MSH02] [MSH02] Stream applicationsStream applications

– Telecom call recordsTelecom call records– Network security [BO03]Network security [BO03]– Habitat monitoring [MPS02]Habitat monitoring [MPS02]– Structural health monitoringStructural health monitoring

ContinuousContinuousQueriesQueries

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

33

DSMS ModelDSMS Model

UserQuery

ProcessingUnit

Central Processor

Continuous Query

Result (Refreshed if needed)

stream

stream

stream

streamNetworkReal-time, Response Time

requirement

Massive, FastLimited

memory, CPU, network

bandwidth

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

44

Trading Accuracy for Query TimelinessTrading Accuracy for Query Timeliness

A user may accept an answer with a A user may accept an answer with a carefully controlled carefully controlled error toleranceerror tolerance – wide-area resource accountingwide-area resource accounting– load-balancing in replicated load-balancing in replicated

serversservers

The system exploits The system exploits error toleranceerror tolerance to reduce communication and to reduce communication and computation costscomputation costs

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

55

Value-based ToleranceValue-based Tolerance

Often assumed in literature [Often assumed in literature [OJW03, OJW03, JCW04JCW04]]

Maximum error is a numerical value Maximum error is a numerical value specified by userspecified by user

MAX Query: MAX Query: Return sensor id with the Return sensor id with the highest temperaturehighest temperature

Guarantee the sensor id returned has Guarantee the sensor id returned has temperature value not lower than temperature value not lower than from from that of the true answerthat of the true answer

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

66

Is Selecting Is Selecting Easy? Easy?

Location-based application: a user inquires Location-based application: a user inquires about his closest neighborabout his closest neighbor– Should the tolerance be 0.1, 1, or 100 meters?Should the tolerance be 0.1, 1, or 100 meters?

Sensor network collects humidity, temperature, Sensor network collects humidity, temperature, UV-index, wind speedUV-index, wind speed– Does user know the range of error for each Does user know the range of error for each

type?type? Multi-dimensionalMulti-dimensional data streams (e.g., location) data streams (e.g., location) Multimedia Multimedia data streams (e.g., CCTV images)data streams (e.g., CCTV images)

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

77

Is Selecting Is Selecting for MAX Query for MAX Query easy?easy?

Suppose a user accepts an object that ranks 2nd or above.

small

If is too small……

large

If is too large……

ideal

The ideal ……

Tolerance wasted

Errorunacceptable

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

88

Rank-based ToleranceRank-based Tolerance

Express error Express error tolerance as a tolerance as a rankrank

Error tolerance = Error tolerance = no. of positions the no. of positions the returned sensor returned sensor could rank below could rank below the highest onethe highest one

More intuitive and More intuitive and easier to specifyeasier to specify

Rank-based tolerance = 1

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

99

Non-Value ToleranceNon-Value Tolerance

Rank-based tolerance is Rank-based tolerance is non-value- tolerancenon-value- tolerance– numerical value numerical value not usednot used

Fraction-based ToleranceFraction-based Tolerance– False Positive False Positive FF++(t): % of returned (t): % of returned

answers that are incorrect at time answers that are incorrect at time tt– False Negative False Negative FF--(t): % of correct (t): % of correct

answers not returned at time answers not returned at time tt– FF++(t) (t) ≤ ≤ ++; ; FF--(t) (t) ≤ ≤ --

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

1010

Entity-based QueriesEntity-based Queries

Return sets of object ids, not numerical values [CKP03]Return sets of object ids, not numerical values [CKP03] Rank-based queries: Rank-based queries: order of stream values decides order of stream values decides

the final answerthe final answer– e.g., top-e.g., top-kk query, query, kk-nearest-neighbor query-nearest-neighbor query

Non-rank-based queries: Non-rank-based queries: order of stream values is order of stream values is not importantnot important– e.g., range querye.g., range query

Non-value tolerance Non-value tolerance matchesmatches entity-based queries! entity-based queries!

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

1111

N o n -ra n k-b a sed q ue ries R a n k-b a se d qu e ry

V a lue -ba sed to le ran ce

R a n k-b a se d qu e rykN N Q u e ry

R a n k-b ase d to le ra n ce

R a n k-b a se d qu e rykN N Q u e ry

N o n -ra n k -ba se d q u e ryR a ng e Q u e ry

F ra c tio n -ba se d to le ra n ce

N o n -va lu e to le ra n ce

A p p ro x im a te C o n tin u o us Q ue ries

Continuous Query Continuous Query ClassificationClassification

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

1212

Adaptive Filter [OJW03]: Adaptive Filter [OJW03]: Initialization PhaseInitialization PhaseAdaptive Filter [OJW03]: Adaptive Filter [OJW03]: Initialization PhaseInitialization Phase

ConstraintAssignment

Unit

Data Stream 1

FilterBounds

User-defined Tolerance

Data Stream 2

Data Stream3

[l3,u3]

[l2,u2]

[l1,u1]

Answer tolerance is met as long as

no update is generated

Answer tolerance is met as long as

no update is generated

Query Processing

Unit

ApproximateAnswer

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

1313

Adaptive Filter: Maintenance Adaptive Filter: Maintenance PhasePhaseAdaptive Filter: Maintenance Adaptive Filter: Maintenance PhasePhase

ConstraintAssignment

Unit

New Filter Bound

User-defined Tolerance

Update (v2>u2 or v2 < l2)

Data Stream 1 (v1)

Data Stream 2 (v2)

Data Stream3 (v3)

[l3,u3]

[l2,u2]

[l1,u1]

[l2,u2]

RequestValue v3Tolerance

violated!trigger

Maintenance Phase

Query Processing

Unit

ApproximateAnswer

Corrected ApproximateAnswer

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

1414

ContributionsContributions

Apply Apply filter boundsfilter bounds to torank-based / non-rank-based queriesrank-based / non-rank-based queries

subject to subject to rank-based / fraction-based tolerancerank-based / fraction-based tolerance

to reduce to reduce message costsmessage costs

Correctness proofs, cost analysis and Correctness proofs, cost analysis and experimental evaluation of each protocolexperimental evaluation of each protocol

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

1515

N o n -ra n k-b a sed q ue ries R a n k-b a se d qu e ry

V a lue -ba sed to le ran ce

R a n k-b a se d qu e rykN N Q u e ry

R a n k-b ase d to le ra n ce

R a n k-b a se d qu e rykN N Q u e ry

N o n -ra n k -ba se d q u e ryR a ng e Q u e ry

F ra c tio n -ba se d to le ra n ce

N o n -va lu e to le ra n ce

A p p ro x im a te C o n tin u o us Q ue ries

Filter Bound ProtocolsFilter Bound Protocols

RTP FT-RP FT-NRPZT-RP ZT-NRP

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

1616

Non-Rank-based QueriesNon-Rank-based Queries

S6 S5 S2 S7S4 S8S1S3

Ordered Values

Answer SetExample: 1D Range Query

2 6 11 14 23 25 34 41

Range = [10, 30]

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

1717

Fraction-based ToleranceFraction-based Tolerance

S6 S5 S2 S7S4 S8S1S3

Range of Q = [l, u]

Ordered Values

Update Update

False PositiveFalse Negative

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

1818

Fraction-based Fraction-based ToleranceTolerance

Answer actually returned

A(t)

E+(t)

True answer at time t

F (t) E (t)

A(t)

|A(t)|-E+(t) E-(t)

F (t) E (t)

A(t) E (t) E (t)

= |A(t)| - E+(t) + E-(t)

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

1919

Initialization PhaseInitialization Phase

– Given Given εε++ and and εε--

1.1. Collect current stream valuesCollect current stream values2.2. For streams satisfying the range queryFor streams satisfying the range query

Calculate no. of streams (Calculate no. of streams (EEmaxmax++) that can be false ) that can be false

positivespositives Assign Assign false +ve filtersfalse +ve filters [- [-∞, + ∞] to ∞, + ∞] to EEmax max streamsstreams Assign [l,u] to remaining onesAssign [l,u] to remaining ones

3.3. For streams failing the range queryFor streams failing the range query Calculate no. of streams (Calculate no. of streams (EEmaxmax

--) that can be false ) that can be false negativesnegatives

Assign Assign false -ve filters false -ve filters [+∞, +∞] to [+∞, +∞] to EEmaxmax- - streamsstreams

Assign [l,u] to remaining onesAssign [l,u] to remaining ones– Tolerance is satisfied if no new updates are receivedTolerance is satisfied if no new updates are received

At any time t without update,F+(t) ≤ +

F-(t) ≤ -

At any time t without update,F+(t) ≤ +

F-(t) ≤ -

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

2020

Maintenance Phase: Good Maintenance Phase: Good UpdateUpdate

S6 S5 S2 S7S4 S8S1S3

Insert SInsert S7 7 into A(into A(ttcc)) FF++

and Fand F-- dropdrop

FF++((ttcc) < F) < F++((tt00) ) ≤ ≤ ++ FF--((ttcc) < F) < F--((tt00) ) ≤ ≤ --

Tolerance is metTolerance is met

time time ttcc time time tt00

Filter [l,u]Range of Q = [l, u]

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

2121

Maintenance Phase: Bad Maintenance Phase: Bad UpdateUpdate

1.1. Remove SRemove Sii from A( from A(ttcc))2.2. F F + + ((ttcc) ) ≤ ≤ + + andand F F - - ((ttcc)) ≤ ≤ -- may not be may not be

truetrue3.3. QualityQuality of answer becomes worse of answer becomes worse4.4. Procedure Procedure FixFix to maintain tolerance to maintain tolerance

S6 S5 S2 S4 S8S1S3

time time ttcctime time tt00Filter [l,u]

Range of Q = [l, u]

S7

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

2222

Fix: Consulting False Positive FilterFix: Consulting False Positive Filter

S6 S5 S2 S7S4 S8 S1S3

Select stream Select stream SS44 A(tA(tcc)) with [- with [-∞, +∞] filter ∞, +∞] filter Request SRequest S44 for its updated value for its updated value If If VV44 [[l, ul, u]]

– install [install [l, ul, u] filter to S] filter to S44

– prove thatprove that F F ++(t(tcc)) ≤ ≤ + + and and F F - - ((ttcc)) ≤ ≤ -- are are satisfiedsatisfied

If If VV4 4 [[ll, , uu]], consult a false –ve filter, consult a false –ve filter Worst case: 5 messages Worst case: 5 messages

Filter [-∞, [-∞, +∞]+∞]

Range of Q = [l, u]

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

2323

Filter Bound Protocols for Filter Bound Protocols for Rank-based QueriesRank-based Queries

k-NN query is a representative of NN, Min, Maxk-NN query is a representative of NN, Min, Max Fraction-based tolerance / k-NN queryFraction-based tolerance / k-NN query

– View a k-NN query as a range query, by using the View a k-NN query as a range query, by using the kth nearest neighbor as the “range”kth nearest neighbor as the “range”

– Adapt fraction-based tolerance/range queryAdapt fraction-based tolerance/range query Rank-based tolerance / k-NN queryRank-based tolerance / k-NN query

– Maintain knowledge about (k+r)Maintain knowledge about (k+r)thth and (k+r+1) and (k+r+1)stst item item– Filter bound is defined by the average of the (k+r)Filter bound is defined by the average of the (k+r)thth

and (k+r+1)and (k+r+1)stst item item

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

2424

ExperimentsExperiments

CompareCompare– No filter is used at allNo filter is used at all– Filter protocols with zero toleranceFilter protocols with zero tolerance– Our tolerance-based protocolsOur tolerance-based protocols

Measure total no. of messages Measure total no. of messages required for executing a continuous required for executing a continuous queryquery

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

2525

Experimental SetupExperimental Setup

Real DataReal Data– 30 days of wide-area traces of TCP 30 days of wide-area traces of TCP

connections based on TCP trace [ITA20]connections based on TCP trace [ITA20] Synthetic DataSynthetic Data

– Generated by CSIM 18Generated by CSIM 18– Data value: Data value: Uniform distributionUniform distribution– Fluctuation of updates: Fluctuation of updates: Normal distributionNormal distribution– Interarrival time of updates: Interarrival time of updates: Exponential Exponential

distributiondistribution

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

2626

Fraction-based Tolerance for Range Fraction-based Tolerance for Range Query with Real DataQuery with Real Data

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

2727

Fraction-based Tolerance for Range Fraction-based Tolerance for Range Query with Synthetic DataQuery with Synthetic Data

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

2828

ConclusionsConclusions

Value-based tolerance can be difficult to Value-based tolerance can be difficult to specify for continuous queries in stream specify for continuous queries in stream systemssystems

Rank-based and fraction-based toleranceRank-based and fraction-based tolerance Applied to rank- queries and non-rank- Applied to rank- queries and non-rank-

queriesqueries Filter bound protocols translate non-value- Filter bound protocols translate non-value-

tolerance to filter boundstolerance to filter bounds Experiments illustrate protocol effectivenessExperiments illustrate protocol effectivenessPlease contact Reynold Cheng ([email protected]) for detailsPlease contact Reynold Cheng ([email protected]) for details

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

3030

Issues of Running Out of Issues of Running Out of FiltersFilters If all false positive and false negative If all false positive and false negative

filters run out, the system degrades to filters run out, the system degrades to one in which no tolerance is exploitedone in which no tolerance is exploited

To improve performance, initialization To improve performance, initialization phase may be executed againphase may be executed again

Experiments over long-running queriesExperiments over long-running queries

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

3131

Long-Running QueriesLong-Running Queries

Cheng,Kao,Prabhakar,Kwan,TuCheng,Kao,Prabhakar,Kwan,Tu Adaptive Stream Filters for Entity-based Queries wAdaptive Stream Filters for Entity-based Queries with Non-Value Toleranceith Non-Value Tolerance

4242

False +ve / -ve Filters Selection False +ve / -ve Filters Selection HeuristicHeuristic

23K

24K

25K

26K

27K

28K

29K

30K

31K

32K

0 0.1 0.2 0.3 0.4 0.5

Nu

mb

er

of M

essa

ge

s RandomBoundary