Computer Science Towards Secure Dataflow Processing in Open Distributed Systems Juan Du, Wei Wei,...
-
date post
19-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of Computer Science Towards Secure Dataflow Processing in Open Distributed Systems Juan Du, Wei Wei,...
Computer Science
Towards Secure Dataflow Processing in Open Distributed Systems
Juan Du, Wei Wei, Xiaohui (Helen) Gu, Ting Yu
1/21
Computer Science
Outline
IntroductionDesign and AlgorithmsExperimental EvaluationRelated WorkConclusion
2/21
Computer Science
Composer
Dataflow Processing in Distributed System
3/21
S6 S3
S2S4 S7
S1
S12
Dataflow Si Data processing component
ADU
User
f1
f1 f5
f2f3
f4
f2
di
…di,…
…,f 1
(d i),… …,f2(f1(di)),…
…,f 3
(f 2(f 1
(d i))),…
…,f3(f2(f1(di))),…
Component provider
S9
f5
…di ,…
Computer Science
Run in Open Distributed Systems
Dataflow Processing Applications– Network traffic monitoring– Sensor data analysis– Audio/video surveillance– Scientific data processing
Advantages in Open Distributed Systems– Highly scalable and available infrastructures– No need to maintain hardware and software
Challenges in Open Distributed Systems– Component providers come from different security
domains– Not all data processing components are trustworthy
4/21
Computer Science
Composer
ADU Attack
5/21
S3
S2S4 S7
S1
User
Si Malicious component
S3
S4
… d2, d1
… f 1
(d 2), f
1(d 1
)… f2(f1(d1)
S12
f1
f5
f2f3
f4
f2
Dataflow Si Data processing component
ADUdiComponent provider
S6
f1
S9
f5
… f2(f1(d1), d0
Computer Science
Composer
Dataflow Topology Attack
6/21
S3
S2S4 S7
S1
User
S3
S4
S12
f1
f5
f2f3
f4
f2
… f 1
(d 2), …
…f3(f5(f2(f1(d2)))), …
…f3 (f2(f1(d2)))), …
Si Malicious component
Dataflow Si ADUdiComponent provider
Data processing component
S6
f1
S9
f5
Computer Science
Composer
Function Integrity Attack
7/21
S3
S2S4 S7
S1
User
S3
S4
S12
f1
f5
f2f3
f4
f2
… f0(f1(d2)),…… f1(d2),…
Si Malicious component
Dataflow Si ADUdiComponent provider
Data processing component
S6
f1
S9
f5
… f 1
(d 2), …
Computer Science
System Design
Attack Models– ADU attack– Dataflow topology attack– Function integrity attack
Assumptions– Third-party component providers could be malicious– Composers and users are trusted– PKI is deployed in advance
Goals– Provide integrity and confidentiality for dataflow
processing applications
– Focus on discussing integrity issues
8/21
Computer Science
Provenance-based ADU Protection
“Receipt” packet– ADU dropping attack– s2 may claim it does not receive d
– s1 may claim it sends d, but it doesn’t
S1 S2• d
• [sqn, session_Id, hash(d)]sign_s2
• receipt
C
9/21
•d •
d
Computer Science
Provenance-based ADU Protection
Provenance evidence– Cached or carry-on evidence– Consistency verification between different components
10/21
S1 S2
• f1
• [[h(d), h(f1(d))]sign_s1]key_c
• f1(d)
C C• d
• f2 • f2(f1(d))
• [[h(d), h(f1(d))]sign_s1]key_c
• [[h(f1(d)), h(f2(f1(d)))]sign_s2]key_c
• input
• output• inpu
t• output
Computer Science
sig_c
Dataflow Topology Protection
Cascading topology encryption– Any component cannot change the dataflow topology– Each component only knows its previous hop and next hop
11/21
• C
• s
1
• C
• s
2• s
3
• f
1
• f
2
• f
3
• C s1 s2 s3 C
[s1] [s2] [s3] [C] C key_s1 key_s2 key_s3 sig_c sig_c sig_c
Computer Science
[s1] [s2] [s3] [C]
[s1]sig_c [s2]sig_c [s3]sig_c [C]sig _ c key_s3 key_s2
Dataflow Topology Protection
Cascading topology encryption– Any component cannot change the dataflow topology– Each component only knows its previous hop and next hop– Onion routing [Goldschlag, et al., 1999]
12/21
• C
• s
1
• C
• s
2• s
3
• f
1
• f
2
• f
3
• C s1 s2 s3 C
• [s2]sig_c [s3]sig_c [C]sig _ c key_s3
• [s3]sig_c [C]sig _ c
sig_c C key_s1 key_s2 key_s3 sig_c sig_c sig_c
Computer Science
Function Integrity Attestation
Randomized data attestation– Achieve scalable function integrity attack detection
• Duplicate a random subset of ADUs • Send duplicates to selected functionally equivalent components• Check result consistency
– Continuously perform randomized data attestation
13/21
• f
1
• f
2
• C • C
• d
1
• s
1• s
2• s
3• s
4
• s
5• s
6• s
7• s
8
• d
2
• d
1
• f1(d
1)• f2(f1(d1
))• d2• f1(d
2)
• f2(f1(d2
’))
• d2’ • f1(d
2’)
• f2(f1(d2
))
f2(f1(d2)) = = f2(f1(d2’)) ?
• d
3
• d
3
• d
3
’
, f1(d3) , f2(f1(d3))
f1(d3’) • f2(f1(d3’))
f2(f1(d3)) = = f2(f1(d3’)) ?
Computer Science
Implementation and Experimental Setup
14/21
Implementation– Implement a prototype of the secure dataflow processing– Follow the design of the IBM System S
Experiment setup– Conduct experiments on Planetlab– Use about 200 hosts – One host represents one component provider – Composer deployed on a pre-defined Planetlab host
Computer Science
Evaluation
15/21
Overhead caused by basic protection schemesRandomized data attestation
– Overhead• in terms of dataflow processing delay• (time of dn getting out - time of d1 getting in ) / n
– Detection probability • non-collusion • collusion
Computer Science
Overhead of Basic Protection Schemes
• The overhead is about 10~15% for both secure dataflow schemes
16/21
Computer Science
Overhead of Randomized Data Attestation
• # of redundant
components k = 5• data size = 1KB• data rate = 10 ADUs/sec• duration = 30s
• Avg dataflow processing delay increases with the number of redundant components used
• Due to sub-optimal dataflow topology
17/21
Computer Science
Detection Probability
• Detection probability increases with duplication probability pu and number of redundant components used
• Detection is harder in collusion scenarios than that in non-collusion scenarios
18/21
Computer Science
Related Work
Distributed dataflow processing– Focuses on resource and performance management issues– Assumes that data processing components are trustworthy
Trust management in distributed systems– Distributed messaging systems [Haeberlen, et al. SOSP
2007]– Pub-sub overlay [Srivatsa, et al., CCS 2005]– None of them addressed secure and scalable dataflow
processing in open distributed system
Byzantine fault-tolerance – in Wide area networks [Amir, et al., DSN 2006]– No trusted party
19/21
Computer Science
Conclusion
Finished Work– The first attempt to address the integrity of dataflow
processing application delivery on open distributed systems– Identify and classify major security attacks– Propose a set of effective protection schemes
Future Work– Non-linear dataflow topology– Integrity attestation on stateful function– Further identify malicious component
20/21
Computer Science
•Thank you•Questions?
21/21