Multi-scale Real-time Grid Monitoring with Job Stream Mining
Transcript of Multi-scale Real-time Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsMulti-s ale Real-time Grid Monitoringwith Job Stream MiningXiangliang Zhang, Mi hele Sebag, Ce ile Germain-RenaudTAO − INRIA CNRSUniversité de Paris-Sud, F-91405 Orsay Cedex, Fran e21 May 2009Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsContents1 Monitoring system: Grid adapted StrAP2 Streaming Jobs3 Monitoring OutputsMonitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleXiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsContents1 Monitoring system: Grid adapted StrAP2 Streaming Jobs3 Monitoring OutputsMonitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleXiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsMulti-s ale Realtime Grid Monitoring System
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsMulti-s ale Realtime Grid Monitoring System
1 2 3 4 50
20
40
60
80
100
700000
10 47 54129 0 0
8 18 24 30595139
7 13 14 24 972819190
Per
cent
age
of jo
bs a
ssig
ned
(%)
Outliers
Clusters
exemplar shown as a job vector
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsMulti-s ale Realtime Grid Monitoring System
0 20 40 60 80 100 120 140 1600
5
10
15
20
25
30
days
perc
enta
ge o
f job
s (%
)
distirbution of jobs like [7 0 0 0 0 0]
0 20 40 60 80 100 120 140 1600
10
20
30
40
50
60
70
80
90
days
perc
enta
ge o
f job
s (%
)
distirbution of jobs like [0 0 0 0 0 0]
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsMulti-s ale Realtime Grid Monitoring SystemA�nity Propagation (AP)A lustering method: group similar points togetherStrAP (Streaming AP)Online Clustering streaming data based on APXiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsWhy AP ??A�nity Propagation (AP)A lustering methodGroup similar points togetherConverge by Iterations of Message passing� > more stable resultsNo need of K (the number of lusters)� > less prior knowledgeA real point as an exemplar to represent a luster� > avoid meaningless averaged entersClustering by Passing Messages Between Data Points. B.J. Frey, D. Due k. S ien e 2007Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsHow AP works ??
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsHow AP works ??
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsHow AP works ??
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsHow AP works ??
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsHow AP works ??
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsHow AP works ??
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsHow AP works ??
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsHow AP works ??
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsGrid adapted StrAPGrid adapted StrAP (Streaming AP):Online lustering streaming jobs� > one-s an of the streamIn remental update of model� > keep tra king the streamDete ting distribution hanges in stream� > absorb new patternsData streaming with A�nity propagation. Xiangliang Zhang, Cyril Furtlehner, Mi hele Sebag. ECML2008.Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsStream lusteringe e e i i e i i e e i iModel Reservoireeeeeeef jjjiiiij
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsStream lusteringe e e i i e i i e e i i eModel Reservoireeeeeeefeeeeeeef jjjiiiijDoes xt �t the urrent model ??if yes, update the modelotherwise, go to reservoir
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsStream lusteringe e e i i e i i e e i i e iModel Reservoireeeeeeef jjjiiiijjjjiiiijDoes xt �t the urrent model ??if yes, update the modelotherwise, go to reservoir
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsStream lusteringe e e i i e i i e e i i e i�@Model Reservoireeeeeeef jjjiiiij �@Does xt �t the urrent model ??if yes, update the modelotherwise, go to reservoir
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsStream lusteringe e e i i e i i e e i i e i i e�@ i e� �@ @ �@Model Reservoireeeeeeef jjjiiiij � � �@ @ @Has the distribution hanged ??CHANGE TESTif yes, rebuilt the modelotherwise, ontinueXiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsStream lusteringe e e i i e i i e e i i e i�@ i e� �@ @ �@Model Reservoireeeeeeef jjjiiiij�@Has the distribution hanged ??CHANGE TESTif yes, rebuilt the modelotherwise, ontinueXiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsOur ModelOutputei , the exemplar ( enter of luster)ni , size of lusterΣi , average distan e of points to their exemplarT , time stamp when the luster was latterly visitedParametersǫ, threshold of omparing ea h point with model (set to around value of Σi in the initial model)∆, de ay window (de rease the weight of old exemplars)Page-Hinkley parameters ( hange dete tion)Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsContents1 Monitoring system: Grid adapted StrAP2 Streaming Jobs3 Monitoring OutputsMonitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleXiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsEGEE (Enabling Grids for E-s ien E)Funded by European Commission( ontribution: 32,000,000 euro)Start in April 2004Grid infrastru ture availableto s ientists 24 hours-a-day.http://publi .eu-egee.org/Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsEGEE JobsEGEE logs of 39 RBs during 5 months (2006-01-012006-05-31) olle ted by Real Time Monitor (RTM) system(http://gridportal.hep.ph.i .a .uk/rtm/)5,268,564 jobsfor ea h job, its�nal status (good or type of errors)UI, RB, CEtime stamps of every servi es happenedXiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsJob attributesregistration_Time: time for registering the jobmat h_Time: time to �nd a mat hing resour eupto_s heduled_transfer_Time: time a eptation and transfer (waiting + readytime), as reported by the JobController (JC)upto_s heduled_a eptan e_Time: the same as Ready_for_Transfer_Time, butas reported by the LogMonitor (LM)logmonitor_ e_s heduled_Time: time job waiting in a queuelogmonitor_wn_Time: exe ution timeXiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsMulti-s ale Realtime Grid Monitoring System
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsPre-pro essing and NormalizationPre-pro essing6 boolean attributesindi ate whether the servi es were rea hed or notNormalizationby entering with standard deviation 1job xi is normalized to x ′i = xi−µswhere, µ and s are mean and standard deviation from a part ofstreams.Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring OutputsLoad of jobs per day
20 40 60 80 100 120 1401
2
3
4
5
6
7
8x 10
4
Days
Number of jobs per day
Sat & Sun
Mon
Tue
Wed
Thu
Fri
line
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleContents1 Monitoring system: Grid adapted StrAP2 Streaming Jobs3 Monitoring OutputsMonitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleXiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleMonitoring on a short-time s ale
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleReal-time Monitoring: when hange dete ted
1 2 3 4 50
10
20
30
40
50
60
70
80
90
100
Reservoir
700000
10 47 54129 0 0
8 18 24 30595139
7 13 14 24 972819190
Clusters
Per
cent
age
of jo
bs a
ssig
ned
(%)
exemplar shown as a job vector
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleReal-time Monitoring: when hange dete ted
1 2 3 4 5 6 7 80
10
20
30
40
50
60
70
80
90
100the assignment of jobs between restart 1 and restart 2
Reservoir
700000
10 47 54129 0 0
90 3 5 8220199
8 18 24 30595139
6 5 10 14 12710854
7 13 14 24 972819190
7 18 34 3950190 4619Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleReal-time Monitoring: when hange dete ted
1 2 3 4 5 6 7 80
10
20
30
40
50
60
70
80
90
100the assignment of jobs between restart 2 and restart 3
Reservoir
700000
10 47 54129 0 0
90 3 5 8220199
8 18 24 30595139
6 5 10 14 12710854
7 13 14 24 972819190
14 8 13 205588316076Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleReal-time Monitoring: when hange dete ted
1 2 3 4 5 6 7 80
10
20
30
40
50
60
70
80
90
100the assignment of jobs between restart 3 and restart 4
Reservoir
700000
10 47 54129 0 0 90
3 5 8220199
8 18 24 30595139
6 5 10 14 12710854
50 16 23 12036311 4081
7 18 34 3950190 4619Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleReal-time Monitoring: when hange dete ted
1 2 3 4 5 6 7 80
10
20
30
40
50
60
70
80
90
100the assignment of jobs between restart 4 and restart 5
Reservoir
700000
10 47 54129 0 0
24 154 1909395 0 0
90 3 5 8220199
8 18 24 30595139
24 150 1879392 314 611
6 5 10 14 12710854Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleReal-time Monitoring: when hange dete ted
1 2 3 4 5 6 7 80
10
20
30
40
50
60
70
80
90
100the assignment of jobs between restart 5 and restart 6
Reservoir
000000
700000
10 47 54129 0 0
9 18 2520110 0 0
8 18 24 30595139
6 5 10 14 12710854
10 18 2920091 395 276
LogMonitor is getting logged.Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleWho is responsible for the logging ??Distribution of Attr4/Attr3Distributionof alljobs over39 RBsDistributionof jobsfrom9-th RB0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x 106
0
1
2
3
4
5
6
7
8x 10
4
jobs
Att4
/Att3Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleWho is responsible for the loggong ??Whi h RB ??0 5 10 15 20 25 30 35 40
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
RBs
Cor
rela
tion
coef
ficie
nts
gdrb04.****.ch
gdrb03.****.chlappgrid07.****.fr
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleClustering Quality Assessment
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleClustering PurityPurity = 100% × (∑Ki=1 |Cdi |
|Ci | )/Kwhere K is number of lusters,|Ci | is size of luster i ,|C di | is number of majority lass items in luster i .
0 100 200 300 400 50080
85
90
95
100
Ave
rage
d pu
rity
of e
ach
clus
ter
(%)
Restarts
0 50 100 150 200 250 300 350 400 450 500 550050100150200250300
Num
ber
of c
lust
ers
Number of clustersAveraged purity of each cluster
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleDis ussReal-time quality:on average 10000 jobs in 1 minute vs maximum load:80000 per dayIntel 2.66GHz Dual-Core PC with 2 GB memory oding inmatlabon average 60000 jobs in 1 minute oding in C/C++ ompa t and live des ription of job patternsproportion of good jobs and failed jobsdi�erent time ost of servi es the jobs went throughXiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleMonitoring on a medium-time s ale
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleRupture steps0 20 40 60 80 100 120 140 160
0
2
4
6
8
10
12
days
num
ber
of r
esta
rts
per
day
keep tra king the evolving of job distributionprovides intuitive view of grid regime and its stabilityXiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleMonitoring on a large-time s ale
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleLarge-time s ale Monitoring: Global view
0 20 40 60 80 100 120 140 1600
5
10
15
20
25
30
days
perc
enta
ge o
f job
s (%
)
distirbution of jobs like [7 0 0 0 0 0]
0 20 40 60 80 100 120 140 1600
10
20
30
40
50
60
70
80
90
days
perc
enta
ge o
f job
s (%
)
distirbution of jobs like [0 0 0 0 0 0]
Clustering the exemplars �> Super exemplarsSuper lusters: Cluster of exemplarsthe history behavior of these super lustersXiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleBad Super Examples: day viewDays
Super Clusters
20 40 60 80 100 120 140
2
4
6
8
10
12
14
16
18
20 0
10%
20%
30%
40%
50%
60%
70%
80%
90%
Re- he k of �early stopped error� type of errors (�rst row)Date Jan 7∼13 Jan 30 ∼ Feb 3 Mar 16∼21 May 17∼19UI A1 A1 B1 D1 and A1Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleDis ussion and Con lusionreal-time monitoring Grid job streamsproviding multi-s ale models to des ribing the status of Gridproportion of di�erent type of job patterns (realtime-view,day-view, week-view ....)rupture stepso�ine globally analysisgood quality lustering is guaranteed
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleFuture workmore omprehensive des ription of the jobs, e.g., related to UIand CEinterpret the model dynami s, e.g., relating the rebuildfrequen y to alendar or so ial events, in ollaboration withthe operation teams.
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining
Monitoring system: Grid adapted StrAPStreaming JobsMonitoring Outputs Monitoring on short-time s aleClustering QualityMonitoring on medium-time s aleMonitoring on large-time s aleThank youQestions ??
Xiangliang Zhang, Mi hele Sebag, Ce ile Germain-Renaud Grid Monitoring with Job Stream Mining