Robust Network Traffic Estimation via Sparsity and Low Rank
description
Transcript of Robust Network Traffic Estimation via Sparsity and Low Rank
1
Morteza Mardani and Georgios Giannakis
ECE Department, University of Minnesota
Acknowledgments: MURI (AFOSR FA9550-10-1-0567) grant
Vancouver, CanadaMay 31, 2013
Robust Network Traffic Estimation via Sparsity and Low Rank
2
Traffic monitoring Backbone of IP networks
Traffic anomalies: changes in origin-destination (OD) flows
Failures, transient congestions, DoS attacks, intrusions, flooding
The vision: atlas of anomalies and nominal traffic for network management
The means: leverage sparsity and low rank Complexity control through parsimonious modeling Robustness to anomalies
3
Model Graph G (N, L) with N nodes, L links, and F flows (F >> L)
(as) Single-path per OD flow zf,t
є {0,1}
Anomaly
Packet counts per link l and time slot t
Matrix model across T time slots:
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
f1
f2
l
LxT LxFfat
4
Low rank of traffic matrix
Z: traffic matrix has low rank, e.g., [Lakhina et al‘04]
Data: http://math.bu.edu/people/kolaczyk/datasets.html
5
Sparsity of anomaly matrix
A: anomaly matrix is sparse across both time and flows
0 200 400 600 800 10000
2
4x 10
8
Time index(t)
|af,t
|
0 50 1000
2
4x 10
8
Flow index(f)
|af,t
|
Time
Flows
6
Robust tomography Goal: Find a map of nominal traffic Z and anomalies A
useful for network management tasks
Challenge: impractical to directly measure zf,t Huge number of OD pairs ( ≈ N2 ) Potential anomalies
Transportation networks
Computer networks
Prior art Least-squares and Gaussian models [Cascetta’84], [Zhao et al ’06] Poisson models [Vardi’96]; and entropy minimization [Zuylen’80]
Available data: link counts Y plus priori knowledge on Z
7
Problem statement Recovery from link counts
Seriously ill-posed FT + FT >> LT Nullspace of R includes low-rank matrices
SNMP
Partial NetFlow measurements
Goal: Given and find sparse A and low-rank Z
(P2)
8
Recovery guarantees Noise-free model and estimator
(P3)
Theorem: Given {Y,Pп(U),R,п} if every column of A0 has at most k nonzero entries, and I)-II) hold, then Ǝ λ ϵ [λmin, λmax] for which (P3) exactly recovers {Z0,A0}.
9
Practical implications Accurate estimation possible if
Anomalies sporadic across time and flows
Nominal traffic sufficiently low dimensional
NetFlow samples sufficiently many distinct OD flows
OD node pairs distant and routing paths sufficiently spread out
10
Exact recovery validation
Setup L=105, F=210, T = 420 R ~ Bernoulli(1/2) Z0 = PQ’, P, Q ~ N(0, 1/√FT)
aij ϵ {-1,0,1} w.p. {ρ/2, 1- ρ, ρ /2} Πij ϵ {0,1} w.p. {1-π, π}
π=0.05 π=0.1
Percentage of nonzero entries ( 100)
Ran
k (r
)
0.0010.0030.01 0.03 0.1 0.3 1 3.1 10
1 3
5 7
9 11
1315
1719
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Percentage of nonzero entries ( 100)
Ran
k (r
)
0.0010.0030.01 0.03 0.1 0.3 1 3.1 10
1
3
5
7
9
11
13
15
17
19
0.1
0.2
0.3
0.4
0.5
11
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110
0.1
0.2
0.3
0.4
Rel
ativ
e es
timat
ion
erro
r
TotalAnomalyTraffic
0 100 200 300 400 5000
1
2x 10
7 CHIN -- LOSA
0 100 200 300 400 5000
1
2x 10
7 CHIN -- ATLA
Tra
ffic
leve
l
0 100 200 300 400 5000
2
4x 10
7 CHIN -- IPLS
Time index (t)
0 200 400 600 800 10000
2
4x 10
8 WASH -- STLL
0 1000 2000 30000
2
4x 10
8
Ano
mal
y am
plitu
deWASH -- WASH
0 2000 4000 60000
1
2x 10
8
Time index (t)
HSTN -- HSTN
Internet2 data Real network data
Dec. 8-28, 2008 N=11, L=41, F=121, T=504
10% of flow counts 45% gain for nominal traffic 18% gain for anomalous traffic
---- estimated---- real
Data: http://www.cs.bu.edu/~crovella/links.html
1212
Conclusions
Ongoing research Tradeoff between OD flow and link counts Finding simpler conditions for random ensembles
Spatiotemporal correlation of traffic and sporadic nature of anomalies Estimated map of nominal traffic and anomalies
Thank You!
Exact recovery of unknown low-rank and sparse matrices Deterministic sufficient conditions Angle between certain subspaces
1313
Ongoing research (Satisfiability) Random ensembles
Uniform sparse A Random orthogonal model for Z Row orthonormal compression matrix R Uniformly random sampling for PΠ(.)
How to find a fairly tight probabilistic bound for
Tradeoff between required OD flow count and link count
14
Identifiability issues
Misidentification if low rank and sparse Perturbation in the nullspace
Rank preserving
Sparsity preserving
Subspaces ( )
Nullspaces
15
Incoherence measures Lemma: [Local identifiability] Given and , is unique if and only if
Incoherence parameter
Non-spiky singular values
Intersection between nullspaces
C1) C2) S1
θ=cos-1(μ)
S2
and