Secure and Highly-Available Aggregation Queries via Set Sampling
description
Transcript of Secure and Highly-Available Aggregation Queries via Set Sampling
Secure and Highly-Available Secure and Highly-Available Aggregation Queries via Set SamplingAggregation Queries via Set Sampling
Haifeng YuNational University of Singapore
Haifeng Yu, National University of Singapore 2Haifeng Yu, National University of Singapore 2
Secure Aggregation Queries in Sensor NetworksSecure Aggregation Queries in Sensor Networks
Multi-hop sensor network with trusted base station With the presence of malicious (byzantine) sensors
Goal: Count the # of sensors sensing smoke (i.e., satisfying a certain predicate) Sum, Avg, and other aggregates are similar – see paper
Type-1 attack: Malicious sensors report fake readings If # malicious sensor is small – damage is limited
Not the focus of our work
Haifeng Yu, National University of Singapore 3
1
Haifeng Yu, National University of Singapore 3
Secure Aggregation Queries in Sensor NetworksSecure Aggregation Queries in Sensor Networks
Type-2 attack: Malicious sensors (indirectly) corrupt the readings of other sensors – much larger damage E.g., in tree based aggregation
Focus of most research on secure aggregation – our focus too36
malicious
01 0
01
42
base station
Haifeng Yu, National University of Singapore 4Haifeng Yu, National University of Singapore 4
State-of-Art and Our GoalState-of-Art and Our Goal Active area in recent years (e.g. [Chan et al.’06], [Frikken et
al.’08], [Roy et al.’06], [Nath et al.’09])
All these approaches focus on detection (i.e., safety only) Will detect if the result is corrupted
But will not produce a correct result when under attack
Detecting attacks Tolerating attacks
Safety only Safety + Liveness
System made harmless System made useful
Our Goal
Haifeng Yu, National University of Singapore 5Haifeng Yu, National University of Singapore 5
Our Approach to Tolerating AttacksOur Approach to Tolerating Attacks Previous approaches:
Fix the security holes in tree-based aggregation Dilemma in in-network
processing
Our novel approach: Use sampling With MACs on each
sample, security comes almost automatically
1
36
01 0
01
42
Haifeng Yu, National University of Singapore 6
0
Haifeng Yu, National University of Singapore 6
Our Approach to Tolerating AttacksOur Approach to Tolerating Attacks
sampled
0
0
00
0 0
00
flood the sample result (with a MAC)
Cannot modify the result
Challenge with sampling: Potentially large overhead
Previous approaches: Fix the security holes in tree-based aggregation Dilemma in in-network
processing
Our novel approach: Use sampling With MACs on each
sample, security comes almost automatically
Haifeng Yu, National University of Singapore 7Haifeng Yu, National University of Singapore 7
(Prohibitively) expensive for small b
Background: Estimate Count via Sampling Background: Estimate Count via Sampling n sensors, b sensors sensing smoke (called black
sensors)
Goal: Output (, ) approximation b’ such that:
E.g.: Sample 10 sensors and 5 are black
b’ = 0.5n
Classic result: # sensors needed to sample is
1]|'Pr[| bbb
1
log12b
n
Haifeng Yu, National University of Singapore 8Haifeng Yu, National University of Singapore 8
Reduce the Overhead via Set SamplingReduce the Overhead via Set Sampling Challenges with small b:
Need many samples to encounter black sensors
Set sampling: Sample a set of sensors together Binary result will tell whether any sensor in the set is
black (but not how many)
Efficient implementation in sensor networks – later
Should be easier to hit sets containing black sensors
How effective will this be?
(How many sets do we need to sample to estimate count?)
Haifeng Yu, National University of Singapore 9Haifeng Yu, National University of Singapore 9
Our Results Our Results Novel algorithm for estimating count using set sampling
Defines randomized and inter-related sets, and sample them adaptively
# sets needed to sample:
Previously without set sampling:
nn
O loglog
log12
1
log12b
n
# of samples reduced from polynomial to polylogarithmic
(can be further reduced – see paper)
Haifeng Yu, National University of Singapore 10
Our Results Our Results Per-sensor msg complexity:
Comparable to some detection-only protocols [Roy et al.’06]
Similar msg sizes
See paper for time complexity
See paper for other aggregates (sum, avg)
Set sampling + novel algorithms using set sampling Enables secure aggregation queries despite adversarial interference
Haifeng Yu, National University of Singapore 10
nO log1
log12
nn
O loglog
log12
Haifeng Yu, National University of Singapore 11Haifeng Yu, National University of Singapore 11
Outline of This TalkOutline of This Talk
Background, goal, and summary of results
Simple implementation of set sampling in sensor networks
Main technical results: Novel algorithm for estimating count via set sampling
Haifeng Yu, National University of Singapore 12Haifeng Yu, National University of Singapore 12
Implementing Set Sampling – Non-Secure VersionImplementing Set Sampling – Non-Secure Version
Example: sample the set {A, B, C, D}
Request flooded from the base station: O(log n) bits We use only O(n) (instead of O(2n)) random sets O(log n) bits to
name a set
Reply: Single bit Flood back from all black sensors in the set {e.g., A and C}
Each sensor only forwards the first message received
Base station sees binary answer
Multiple samples can be taken in one flooding Our algorithm takes samples in O(log n) sequential stages Only
O(log n) times of flooding
Goal: O(1) per-sensor msg complexity for sampling a set
Haifeng Yu, National University of Singapore 13Haifeng Yu, National University of Singapore 13
Implementing Set Sampling – Secure DesignImplementing Set Sampling – Secure Design
Each set = Some distinct symmetric key K Preload K onto all sensors in the set
Each sensor should be only be in a small number of sets – O(log n) in our protocol
Request: name of K, nonce Reply: MAC_K(nonce)
Only sensors holding K can generate
DoS attacks possible Can be avoided with improved design – see paper
Haifeng Yu, National University of Singapore 14Haifeng Yu, National University of Singapore 14
Outline of This TalkOutline of This Talk Background, goal, and summary of results
Implement set sampling in sensor networks
Main technical meat: Novel algorithm for estimating count via set sampling For now assume all sensors are honest
Security follows from the clean security guarantees of sampling, though some minor modifications needed – see paper
Haifeng Yu, National University of Singapore 15Haifeng Yu, National University of Singapore 15
Random Sets on the Sampling TreeRandom Sets on the Sampling Tree Basic approach:
Construct (related) randomized sets of different sizes and adaptively sample them
Base station internally created a sampling tree A complete binary tree with 4n leaves
Each tree node = A distinct symmetric key = Some set of sensors
Sampling tree is an internal data structure and not network topology
Haifeng Yu, National University of Singapore 16Haifeng Yu, National University of Singapore 16
1K
2K 3K
4K 6K5K 7K
8K 9K 10K 11K 12K 13K 14K 15K
K1, K2, K5, K10 loaded onto the sensor A
AK1, K3, K6, K12 loaded onto the sensor B
Each sensor is associated with a uniformly random leaf (independently)
Each tree node corresponds to a set containing all the sensors in its subtree
B
Haifeng Yu, National University of Singapore 17Haifeng Yu, National University of Singapore 17
Properties of the Sampling TreeProperties of the Sampling Tree
A sensor is black if it satisfies the predicate
A key is black iff the corresponding set contains black sensor
: fraction of black keys at level i
if
10 f
11 f
5.02 f
25.03 f
Haifeng Yu, National University of Singapore 18Haifeng Yu, National University of Singapore 18
is monotonic as we go down the tree Decrease by a factor of at most 2 per level
At the top (assuming at least one black sensor)
At the bottom (4n leaves!)
Lemma: There exists a level with
10 f
11 f
5.02 f
25.03 f
if
4/1if1if
2
1,4
1f
Haifeng Yu, National University of Singapore 19
Why Level Why Level Helps Helps
not too small Efficient estimation of
via naïve sampling:
samples on level yields an (, )
approximation for
not too large Can potentially estimate final count directly from Chernoff-type occupancy tail bound for balls into
bins
See paper for details
Haifeng Yu, National University of Singapore 19
f
1
log12
O
f
f
ff
Haifeng Yu, National University of Singapore 20Haifeng Yu, National University of Singapore 20
Additional Issues: Too Few Keys on Level Additional Issues: Too Few Keys on Level
Challenge: To estimate final count based on , the number
of keys on level needs to be large enough
If not, need to track down to lower levels Need to leverage other interesting properties on
the sampling tree
See paper
f
Haifeng Yu, National University of Singapore 21Haifeng Yu, National University of Singapore 21
Additional Issues: Finding Level Additional Issues: Finding Level Binary search on the O(log(n)) levels
On each level i examined, sample a small number of random keys to roughly estimate
Extremely efficient
Challenges: The binary search operates on estimated values
(with error and may not be monotonic)
When is small, the estimation only has error guarantee on one side
See paper
if
if
Haifeng Yu, National University of Singapore 22
Example Numerical Results
n = 10,000 and count result (b) range from 0 to 10,000
Overhead: 5-15 sequential stages of sampling
Total 250-300 samples
Avg approximation error: (1±0.08) Hard to get better accuracy even in trusted
environments ([Nath et al.’09])…
Naive sampling: 300 samples gives same accuracy only when b > 2,000
Haifeng Yu, National University of Singapore 23Haifeng Yu, National University of Singapore 23
ConclusionsConclusions
Making aggregation queries secure is critical for many sensor network applications
Contribution: Detecting attacks Tolerating attacks Safety only Safety + Liveness
Our approach: Abandon in-network processing and use sampling Use novel set sampling to reduce the overhead
Polynomial overhead Logarithmic overhead
Haifeng Yu, National University of Singapore 24Haifeng Yu, National University of Singapore 24
Related Work to Set SamplingRelated Work to Set Sampling Decision tree complexity for threshold-t
functions (i.e., whether b t) [Ben-Asher and Newman’95] [Aspnes’09]
Most results are for error-free deterministic protocols
Large lower bound: (t) (implying (b) for count)
No prior results for general Monte Carlo randomized algorithm
Haifeng Yu, National University of Singapore 25Haifeng Yu, National University of Singapore 25
Tolerating Attacks is DifficultTolerating Attacks is Difficult
Example: Byzantine consensus Detection substantially easier than tolerance
n 3f +1 lower bound only applies to tolerance and not detection
Pinpointing / revoking malicious sensors is hard E.g., due to lack of public-key authentication
Active research area by itself
Haifeng Yu, National University of Singapore 26Haifeng Yu, National University of Singapore 26
System ModelSystem Model Multi-hop sensor network with trusted base station
Performance metric: Time complexity – see paper
Performance metric: Per-sensor msg complexity Max number of msgs sent/received by an single sensor
(captures loading balance)
msg size is either 8 bytes (size of a MAC) of log(n) bits
Collision ignored – as in all prior work Or one can apply existing algorithms…
Haifeng Yu, National University of Singapore 27Haifeng Yu, National University of Singapore 27
Implementing Set Sampling – Non-Secure VersionImplementing Set Sampling – Non-Secure Version
Request size: We use at most O(n) (random) sets O(log(n)) bits to name a set
Goal: O(1) per-sensor msg complexity for sampling a set
Request flooding – every sensor sends/receives one msg
Haifeng Yu, National University of Singapore 28Haifeng Yu, National University of Singapore 28
Implementing Set Sampling – Non-Secure VersionImplementing Set Sampling – Non-Secure Version
Reply: Single bit
Goal: O(1) per-sensor msg complexity for sampling a set
A
C
BD
B, C, D satisfies the predicate, A does not
Reply flooding –
Only the first reply is forwarded
This is why set sampling is designed to be binary
Haifeng Yu, National University of Singapore 29Haifeng Yu, National University of Singapore 29
(The overhead of sampling a set needs to be properly controlled – will discuss later.)
Haifeng Yu, National University of Singapore 30Haifeng Yu, National University of Singapore 30
Translating to bTranslating to b We now have a good estimation for
Need to produce a good estimation for b
Let number of keys on level be n
Throw b balls into n bins The fraction of occupied bins has the same
distribution as
This distribution is highly concentrated near its mean (Chernoff-type occupancy tail bound), assuming not too close to 1
n not too small
Haifeng Yu, National University of Singapore 31Haifeng Yu, National University of Singapore 31
Summary of Techniques to Achieve the ResultsSummary of Techniques to Achieve the Results
Define randomized sets based on a complete binary tree Interesting relationships among the sets
Sample the sets adaptively
Leverages Chernoff-type occupancy tail bounds for balls-into-bins