Secure and Highly-Available Aggregation Queries via Set Sampling

Secure and Highly-Available Secure and Highly-Available Aggregation Queries via Set SamplingAggregation Queries via Set Sampling

Haifeng YuNational University of Singapore

Haifeng Yu, National University of Singapore 2Haifeng Yu, National University of Singapore 2

Secure Aggregation Queries in Sensor NetworksSecure Aggregation Queries in Sensor Networks

Multi-hop sensor network with trusted base station With the presence of malicious (byzantine) sensors

Goal: Count the # of sensors sensing smoke (i.e., satisfying a certain predicate) Sum, Avg, and other aggregates are similar – see paper

Type-1 attack: Malicious sensors report fake readings If # malicious sensor is small – damage is limited

Not the focus of our work

Haifeng Yu, National University of Singapore 3

1


Secure Aggregation Queries in Sensor NetworksSecure Aggregation Queries in Sensor Networks

Type-2 attack: Malicious sensors (indirectly) corrupt the readings of other sensors – much larger damage E.g., in tree based aggregation

Focus of most research on secure aggregation – our focus too36

malicious

01 0

01

42

base station


State-of-Art and Our GoalState-of-Art and Our Goal Active area in recent years (e.g. [Chan et al.’06], [Frikken et

al.’08], [Roy et al.’06], [Nath et al.’09])

All these approaches focus on detection (i.e., safety only) Will detect if the result is corrupted

But will not produce a correct result when under attack

Detecting attacks Tolerating attacks

Safety only Safety + Liveness

System made harmless System made useful

Our Goal


Our Approach to Tolerating AttacksOur Approach to Tolerating Attacks Previous approaches:

Fix the security holes in tree-based aggregation Dilemma in in-network

processing

Our novel approach: Use sampling With MACs on each

sample, security comes almost automatically

1

36

01 0

01

42


0


Our Approach to Tolerating AttacksOur Approach to Tolerating Attacks

sampled

0

0

00

0 0

00

flood the sample result (with a MAC)

Cannot modify the result

Challenge with sampling: Potentially large overhead

Previous approaches: Fix the security holes in tree-based aggregation Dilemma in in-network

processing

Our novel approach: Use sampling With MACs on each

sample, security comes almost automatically


(Prohibitively) expensive for small b

Background: Estimate Count via Sampling Background: Estimate Count via Sampling n sensors, b sensors sensing smoke (called black

sensors)

Goal: Output (, ) approximation b’ such that:

E.g.: Sample 10 sensors and 5 are black

b’ = 0.5n

Classic result: # sensors needed to sample is

1]|'Pr[| bbb

1

log12b

n


Reduce the Overhead via Set SamplingReduce the Overhead via Set Sampling Challenges with small b:

Need many samples to encounter black sensors

Set sampling: Sample a set of sensors together Binary result will tell whether any sensor in the set is

black (but not how many)

Efficient implementation in sensor networks – later

Should be easier to hit sets containing black sensors

How effective will this be?

(How many sets do we need to sample to estimate count?)


Our Results Our Results Novel algorithm for estimating count using set sampling

Defines randomized and inter-related sets, and sample them adaptively

# sets needed to sample:

Previously without set sampling:

nn

O loglog

log12

1

log12b

n

# of samples reduced from polynomial to polylogarithmic

(can be further reduced – see paper)


Our Results Our Results Per-sensor msg complexity:

Comparable to some detection-only protocols [Roy et al.’06]

Similar msg sizes

See paper for time complexity

See paper for other aggregates (sum, avg)

Set sampling + novel algorithms using set sampling Enables secure aggregation queries despite adversarial interference


nO log1

log12

nn

O loglog

log12


Outline of This TalkOutline of This Talk

Background, goal, and summary of results

Simple implementation of set sampling in sensor networks

Main technical results: Novel algorithm for estimating count via set sampling


Implementing Set Sampling – Non-Secure VersionImplementing Set Sampling – Non-Secure Version

Example: sample the set {A, B, C, D}

Request flooded from the base station: O(log n) bits We use only O(n) (instead of O(2n)) random sets O(log n) bits to

name a set

Reply: Single bit Flood back from all black sensors in the set {e.g., A and C}

Each sensor only forwards the first message received

Base station sees binary answer

Multiple samples can be taken in one flooding Our algorithm takes samples in O(log n) sequential stages Only

O(log n) times of flooding

Goal: O(1) per-sensor msg complexity for sampling a set


Implementing Set Sampling – Secure DesignImplementing Set Sampling – Secure Design

Each set = Some distinct symmetric key K Preload K onto all sensors in the set

Each sensor should be only be in a small number of sets – O(log n) in our protocol

Request: name of K, nonce Reply: MAC_K(nonce)

Only sensors holding K can generate

DoS attacks possible Can be avoided with improved design – see paper


Outline of This TalkOutline of This Talk Background, goal, and summary of results

Implement set sampling in sensor networks

Main technical meat: Novel algorithm for estimating count via set sampling For now assume all sensors are honest

Security follows from the clean security guarantees of sampling, though some minor modifications needed – see paper


Random Sets on the Sampling TreeRandom Sets on the Sampling Tree Basic approach:

Construct (related) randomized sets of different sizes and adaptively sample them

Base station internally created a sampling tree A complete binary tree with 4n leaves

Each tree node = A distinct symmetric key = Some set of sensors

Sampling tree is an internal data structure and not network topology


1K

2K 3K

4K 6K5K 7K

8K 9K 10K 11K 12K 13K 14K 15K

K1, K2, K5, K10 loaded onto the sensor A

AK1, K3, K6, K12 loaded onto the sensor B

Each sensor is associated with a uniformly random leaf (independently)

Each tree node corresponds to a set containing all the sensors in its subtree

B


Properties of the Sampling TreeProperties of the Sampling Tree

A sensor is black if it satisfies the predicate

A key is black iff the corresponding set contains black sensor

: fraction of black keys at level i

if

10 f

11 f

5.02 f

25.03 f


is monotonic as we go down the tree Decrease by a factor of at most 2 per level

At the top (assuming at least one black sensor)

At the bottom (4n leaves!)

Lemma: There exists a level with

10 f

11 f

5.02 f

25.03 f

if

4/1if1if

2

1,4

1f


Why Level Why Level Helps Helps

not too small Efficient estimation of

via naïve sampling:

samples on level yields an (, )

approximation for

not too large Can potentially estimate final count directly from Chernoff-type occupancy tail bound for balls into

bins

See paper for details


f

1

log12

O

f

f

ff


Additional Issues: Too Few Keys on Level Additional Issues: Too Few Keys on Level

Challenge: To estimate final count based on , the number

of keys on level needs to be large enough

If not, need to track down to lower levels Need to leverage other interesting properties on

the sampling tree

See paper

f


Additional Issues: Finding Level Additional Issues: Finding Level Binary search on the O(log(n)) levels

On each level i examined, sample a small number of random keys to roughly estimate

Extremely efficient

Challenges: The binary search operates on estimated values

(with error and may not be monotonic)

When is small, the estimation only has error guarantee on one side

See paper

if

if


Example Numerical Results

n = 10,000 and count result (b) range from 0 to 10,000

Overhead: 5-15 sequential stages of sampling

Total 250-300 samples

Avg approximation error: (1±0.08) Hard to get better accuracy even in trusted

environments ([Nath et al.’09])…

Naive sampling: 300 samples gives same accuracy only when b > 2,000


ConclusionsConclusions

Making aggregation queries secure is critical for many sensor network applications

Contribution: Detecting attacks Tolerating attacks Safety only Safety + Liveness

Our approach: Abandon in-network processing and use sampling Use novel set sampling to reduce the overhead

Polynomial overhead Logarithmic overhead


Related Work to Set SamplingRelated Work to Set Sampling Decision tree complexity for threshold-t

functions (i.e., whether b t) [Ben-Asher and Newman’95] [Aspnes’09]

Most results are for error-free deterministic protocols

Large lower bound: (t) (implying (b) for count)

No prior results for general Monte Carlo randomized algorithm


Tolerating Attacks is DifficultTolerating Attacks is Difficult

Example: Byzantine consensus Detection substantially easier than tolerance

n 3f +1 lower bound only applies to tolerance and not detection

Pinpointing / revoking malicious sensors is hard E.g., due to lack of public-key authentication

Active research area by itself


System ModelSystem Model Multi-hop sensor network with trusted base station

Performance metric: Time complexity – see paper

Performance metric: Per-sensor msg complexity Max number of msgs sent/received by an single sensor

(captures loading balance)

msg size is either 8 bytes (size of a MAC) of log(n) bits

Collision ignored – as in all prior work Or one can apply existing algorithms…



Request size: We use at most O(n) (random) sets O(log(n)) bits to name a set


Request flooding – every sensor sends/receives one msg



Reply: Single bit


A

C

BD

B, C, D satisfies the predicate, A does not

Reply flooding –

Only the first reply is forwarded

This is why set sampling is designed to be binary


(The overhead of sampling a set needs to be properly controlled – will discuss later.)


Translating to bTranslating to b We now have a good estimation for

Need to produce a good estimation for b

Let number of keys on level be n

Throw b balls into n bins The fraction of occupied bins has the same

distribution as

This distribution is highly concentrated near its mean (Chernoff-type occupancy tail bound), assuming not too close to 1

n not too small


Summary of Techniques to Achieve the ResultsSummary of Techniques to Achieve the Results

Define randomized sets based on a complete binary tree Interesting relationships among the sets

Sample the sets adaptively

Leverages Chernoff-type occupancy tail bounds for balls-into-bins

Secure and Highly-Available Aggregation Queries via Set Sampling

Documents

Transcript of Secure and Highly-Available Aggregation Queries via Set Sampling