Of Rostock University DuDE: A D istributed Computing System u sing a D ecentralized P2P E nvironment...
-
Upload
jade-stephens -
Category
Documents
-
view
212 -
download
0
Transcript of Of Rostock University DuDE: A D istributed Computing System u sing a D ecentralized P2P E nvironment...
of RostockUniversity
DuDE: A Distributed Computing System using aDecentralized P2P Environment
The 4th International Workshop on Architectures, Services and Applications for the Next Generation Internet (WASA-NGI-IV)
Bonn, Germany, October 4th, 2011
J. Skodzik, P. Danielis, V. Altmann, J. Rohrbeck, D. TimmermannUniversity of Rostock, Germany
Institute of Applied Microelectronicsand Computer Engineering
T. Bahls, D. DuchowNokia Siemens Networks
Broadband Access DivisionGreifswald, Germany
2
Outline
• Introduction & Motivation
• DuDE in General
• The DuDE Algorithm in Detail
• Test Scenario and Evaluation
• Summary and Future Work
3
2007 2008 2009 2010 20110
500
1000
1500
2000
2500
3000
0
50
100
150
200
250
300
Internet users and data traffic (AMS-IX)
# Internet users Data traffic Series3
Mil
lio
ns
of
Use
rs
Ave
rag
e M
on
thly
Dat
a T
raff
ic
[100
0 T
B]
• Increasing number of Internet users and traffic data
• Internet Service Providers (ISPs) want to ensure:• Quality of Service (QoS)• The detection of bottlenecks• The detection of attacks
• How to ensure these issues?
Statistics generated from existing log data
User
User
Internet
Access Node (AN)
Situation today
Does an AN have enough resources?Does it provide sufficient statistics at all?
4
User
User
Internet
AN
60%60%
Introduction & Motivation
CPU MEM
Resources utilization
5
Internet
AN 1
AN 2AN 3
AN 4
User
User
User
Internet
AN
Simple support of new statistics types
Simultaneous computation of multiple statistics
Processing of increasing log data volumes
• Processor utilization• RAM utilization• Drops• Number of packets
Short term statistics (STS) for single ANs
Supported
Introduction & Motivation
Not supported
Long term statistics (LTS)
Creation of global statistics
6
Internet
AN 1
AN 2AN 3
AN 4
User
• One AN does not have enough hardware ressources
Usage of multiple ANs to compute statistics• Efficient resource sharing with high resilience and scalability
Utilization of P2P technology
DuDE: Exploitation of already available resources
No extra costs for additional equipment
Introduction & Motivation
7
Internet
AN 1
AN 2AN 3
AN 4
User
Internet
AN 1
AN 2AN 3
AN 4
DuDE in General
Logical P2P ring
ID
ID
ID
ID
Node1
Node2
Node4
Node3
8
Internet
AN 1
AN 2AN 3
AN 4
DuDE in General
Log data (some hundreds of KBs)
8
Data chunk (ca. 100 KBs)
9
DuDE in General
• Objective: High log data availability = 99.999 % • Simple replication wastes memory ressources
Reed-Solomon Codes• Split log data of each AN into m data chunks
m Log Data Chunks
Split
Log Data
10
DuDE in General
• Objective: High log data availability = 99.999 % • Simple replication wastes memory ressources
Reed-Solomon Codes• Split log data of each AN into m data chunks• Encoding: Add k interleaved coding chunks n=m+k chunks
Encoding
k Coding Chunksm Log Data
Chunks
Split
Log Data
11
DuDE in General
• Objective: High log data availability = 99.999 % • Simple replication wastes memory ressources
Reed-Solomon Codes• Split log data of each AN into m data chunks• Encoding: Add k interleaved coding chunks n=m+k chunks• Decoding: Restore log data from any m of n chunks
Decoding
n = m+k Data-/CodingChunks, plus Erasures Log Data
12
Internet
AN 1
AN 2AN 3
AN 4
DuDE in General
Log data (some hundreds of KBs)
Data chunk (ca. 100 KBs)
How to apply our application to P2P?
13
Internet
AN 1
AN 2AN 3
AN 4
Internet
AN 1
AN 2AN 3
AN 4
Internet
AN 2AN 3
AN 4
AN 1
Task Task
Task
• Job = collection of STS and/or LTS tasks• Task = part of job, e.g., request for „CPU“ statistics• Jobscheduler (JS): Reception and monitoring of job• Taskwatcher (TW): Reception and processing of task
DuDE in General
Admin.
Job
Which steps are necessary to compute statistics?
14
60% 60%
50% 50%
30% 30%
10% 10%
The DuDE Algorithm in DetailStage 1: Resource collection
Admin.
…Job …Task …Log data …Global statistics
15
The DuDE Algorithm in Detail
1. R
esou
rce
c
ollec
tion
Admin.
Stage 2: Jobscheduler determination
…Job …Task …Log data …Global statistics
16
The DuDE Algorithm in Detail
Admin.
2. J
obsc
hedu
ler
d
eter
mina
tion
Stage 3: Resource re-collection
1. R
esou
rce
c
ollec
tion
…Job …Task …Log data …Global statistics
17
The DuDE Algorithm in Detail
Admin.
3. R
esou
rce
re
-coll
ectio
n
Stage 4: Task assignment
2. J
obsc
hedu
ler
d
eter
mina
tion
1. R
esou
rce
c
ollec
tion
Request for Processor utilization STS
Request for RAM utilization LTS
Request for Drops LTS
…Job …Task …Log data …Global statistics
18
The DuDE Algorithm in Detail
…Job …Task …Log data …Global statisticsAdmin.
3. R
esou
rce
re
-coll
ectio
n
Stage 4: Task assignment
2. J
obsc
hedu
ler
d
eter
mina
tion
1. R
esou
rce
c
ollec
tion
How to find all log data for global statistics computation?
19
Node1 Node2
Node3Node5
no
The DuDE Algorithm in Detail
• Request for global statistics All data needed
ID ID
ID
Taskwatcher
i Succ? N N >=A ID
1
2
3
4
5
yes
yes
yes
no
no
0
0
0
1
2
no
no
no
no
yes
Node1
Node2
Node3
Node4
Node5
Algorithm is done
Global Peer Data Discovery Algorithm
- Threshold value A = 2
Algorithm is done
5 yes 0 no Node5
6 1 no Node6
7 yes 2 yes Node7
20
The DuDE Algorithm in Detail
Admin.
4. Ta
sk
a
ssign
men
t
Stage 5: Log data collection
3. R
esou
rce
re
-coll
ectio
n
2. J
obsc
hedu
ler
d
eter
mina
tion
1. R
esou
rce
c
ollec
tion
…Job …Task …Log data …Global statistics
21
The DuDE Algorithm in Detail
Admin.
5. L
og d
ata
c
ollec
tion
4. Ta
sk
a
ssign
men
t
3. R
esou
rce
re
-coll
ectio
n
2. J
obsc
hedu
ler
d
eter
mina
tion
1. R
esou
rce
c
ollec
tion
Stage 6: Statistics computation
Processor utilization stat.
RAM utilization stat.
Drops stat.
…Job …Task …Log data …Global statistics
22
The DuDE Algorithm in Detail
Admin.
5. L
og d
ata
c
ollec
tion
4. Ta
sk
a
ssign
men
t
3. R
esou
rce
re
-coll
ectio
n
2. J
obsc
hedu
ler
d
eter
mina
tion
1. R
esou
rce
c
ollec
tion
Stage 7: Send results and display them
6. S
tatis
tics
c
ompu
tatio
n
Admin.
…Job …Task …Log data …Global statistics
23
The DuDE Algorithm in Detail
Admin.
5. R
esto
re
lo
g da
ta
4. A
ssign
task
s
3. R
esou
rce
re
colle
ction
2. D
eter
mine
job
s
ched
uler
1. R
esou
rce
c
ollec
tion
Stage 7: Send results and display them
6. C
ompu
te
s
tatis
tics
Admin.
AN 1 AN 2 AN 3 AN 4 AN 50
10
20
30
40
50
60
70
80
90
Processor utilization of all ANsP
roc
es
so
r U
tiliz
ati
on
[%
]
…Job …Task …Log data …Global statistics
24
Test Scenario and Evaluation
Admin.
2
3
4
5
67
1
Switch
PC Configuration
Pentium 4 (1.5 GHz)
512 MB RAM
Equivalent to AN HW
25
Test Scenario and Evaluation
• Parameters:• Number of tasks inside job• Number of log data sets in the P2P network• Computational load for statistics computation
• Measurements:• Time for finishing a job• Memory utilization
26
1 2 3 4 5 60
50
100
150
200
250
300
350
400
450
500
Finishing a Job (Load: 0 / Log Data Sets: 7)
Single ANDuDE
Number of Tasks
Tim
e [
s]
Test Scenario and Evaluation
Linear Increase of N
eeded Time
Time is Constant
27
1 2 3 4 5 60
500
1000
1500
2000
2500
3000
3500
4000
Finishing a Job (Load: 10 / Log Data Sets: 7)
Single ANDuDE
Number of Tasks
Tim
e [
s]
Test Scenario and Evaluation
Linear Increase of N
eeded Time
Time is Constant
28
1 2 3 4 5 60
10,000,000
20,000,000
30,000,000
40,000,000
50,000,000
60,000,000
70,000,000
Maximum Memory Utilization of AN (Load:10 / Log Data Sets:7)
Single ANJobschedulerØ Taskwatcher
Number of Tasks
Test Scenario and Evaluation
Linear Increase of M
emory Utilization
Constant Memory Utilization
29
1 3 70
10,000,000
20,000,000
30,000,000
40,000,000
50,000,000
60,000,000
70,000,000
Maximum Memory Utilization of ANs (Tasks:6 / Load:10)
Single ANJobschedulerØ Taskwatcher
Number of Log Data Sets
Test Scenario and Evaluation
Memory utilization increases more at the single AN than at the taskwatcher
30
0 1 100
10,000,000
20,000,000
30,000,000
40,000,000
50,000,000
60,000,000
70,000,000
Maximum Memory Utilization of ANs (Log Data Sets:7 / Tasks:6)
Single ANJobschedulerØ Taskwatcher
Computational Load
Test Scenario and Evaluation
Memory utilization is constant and independent of the computational load
31
Summary and Future work
• P2P-based system for distributed computing of statistics
• STS and LTS
• Statistics for a single AN and the whole network• Global Peer Data Discovery Algorithm
• Successfully developed prototype (demo session)
• Investigation of further use cases
32
Thanks for your attention!
Questions?