Stefan Wallin Ph D Presentation : Rethinking Network Management

50
Rethinking Network Management : Models, Data-Mining and Self-Learning Stefan Wallin The Thesis

description

Stefan Wallins Ph D Presentation 23 Feb 2013.http://pure.ltu.se/portal/sv/publications/rethinking-network-management-solutions%28524ec0f6-7cb3-45bd-b350-72a21f0b7c6e%29.html

Transcript of Stefan Wallin Ph D Presentation : Rethinking Network Management

Page 1: Stefan Wallin Ph D Presentation : Rethinking Network Management

Rethinking Network Management :

Models, Data-Mining and Self-Learning

Stefan WallinThe Thesis

Page 2: Stefan Wallin Ph D Presentation : Rethinking Network Management

2

What is Network Management ?

AlarmsService StatusTrouble-shoot Configure Service

Configure Device

Control workflowwith trouble-tickets

Page 3: Stefan Wallin Ph D Presentation : Rethinking Network Management

3

What is Network Management ?

Problems?Alarm MonitoringService Management - Monitor - Configure

Page 4: Stefan Wallin Ph D Presentation : Rethinking Network Management

4

Main Thesis

Use domain-specific languages to specify alarm and service models Explicit knowledge Text-based representation

Use data-mining and self-learning to capture “hard-to-model” things Tacit knowledge

Page 5: Stefan Wallin Ph D Presentation : Rethinking Network Management

5

Research Structure

Sta

tus C

alcu

latio

n

Configura

tion

Changes

Service Type

Service TypeComponent

Device Type

Service Models

Constraints

Alarm Type

Alarm Type

Causa

lityAlarm Models

Constraints

Data-Mining

Self-Learning

Page 6: Stefan Wallin Ph D Presentation : Rethinking Network Management

6

Problems and Contributions

Defined a Domain-Specific Language BASS for specifying alarm models Model Quality Automatic Correlation

Data-Mining and Self-Learning to assign alarm severity levels

Domain-Specific Languages for Service Management Defined SALmon for monitoring Test of IETF YANG for Service

Configuration

Alarm Type

Alarm Type

Cau

sality

Alarm Models

Constraints

Sta

tus C

alcu

latio

n

Config

ura

tion C

hang

es

Service Type

Service TypeComponentDevice Type

Service Models

ConstraintsData-M

ining

Self-Learning

Page 7: Stefan Wallin Ph D Presentation : Rethinking Network Management

7

Attacking the Problems

me

ChallengesSolutionsValidations SolutionsService Providers

Equipment Vendors

Computer Science specialists from• LTU• Data Ductus• Tail-f• YALTS

JournalsConferences

Page 8: Stefan Wallin Ph D Presentation : Rethinking Network Management

8

Publication Overview

Conferences/Workshops IFIP ManWeek IEEE IM IEEE NOMS Usenix LISA IEEE AINA TeNAS IEEE SOSE

Journals IEEE IT Professional Springer

Journal of Network andSystems Management

John Wiley & Sons International Journal ofNetwork Management

Inderscience International Journal ofBusiness Intelligence andData-Mining

Springer TelecommunicationsSystems

Page 9: Stefan Wallin Ph D Presentation : Rethinking Network Management

9

Contents

The Alarm Problem

Alarm Solutions BASS Alarm prioritization

The Service Management Problem

Service Management Solutions Monitoring with SALmon Configuring with IETF YANG

Problems? – Input from Service Providers

Conclusions and Future Work

Acknowledgements

Page 10: Stefan Wallin Ph D Presentation : Rethinking Network Management

Problems?

Input from Service Providers

Page 11: Stefan Wallin Ph D Presentation : Rethinking Network Management

11

Coming Changes

20 Operators

Page 12: Stefan Wallin Ph D Presentation : Rethinking Network Management

12

Research Efforts

20 Operators

Page 13: Stefan Wallin Ph D Presentation : Rethinking Network Management

The Alarm Problem

Page 14: Stefan Wallin Ph D Presentation : Rethinking Network Management

14

Alarm Chain

Managed System Management System

ResourceStates

Alarms AlarmNotifications

EstimatedAlarms

EstimatedResourceStates

?

Alarm TypeResourceSeverityRaise / ClearText

Page 15: Stefan Wallin Ph D Presentation : Rethinking Network Management

15

The Alarm ProblemMost network elements […] does not have the notion of an alarm state. Devices emit event notifications whenever an implementor thought this is a good idea

[around] 40% percent of the alarms are considered to be redundant as many alarms appear at the same time for one ’fault’. Many alarms are also repeated [...]. One alarm had for example appeared 65000 times in today’s browser. Correlation is hardly used even if it supported by the systems, [current correlation level is] 1-2 % maybe.

Page 16: Stefan Wallin Ph D Presentation : Rethinking Network Management

16

The Alarm Problem

Too many > 1 / Sec Which ones are relevant? Several alarms for the same fault

Wrong severity levels

Interpreting meaning and impact

?

Page 17: Stefan Wallin Ph D Presentation : Rethinking Network Management

17

Interpreting an Alarm

*A0628/546 /08-07-01/10 H 38/ N=0407/TYP=ICT/CAT=SI /EVENT=DAL/NCEN=AMS1 /AM=SMTA7/AGEO=S1-TR03-B06-A085-R000 /TEXAL=IND RECEPTION/COMPL.INF: /AF=URMA7/ICTQ7 AGCA=S1-TR03-B06-A085-R117/DAT=08-07-01/HRS=10-38-14 /AMET=07-020-01 /AFLR=175-011/PLS/CRC=NACT /NSAE=186/NSGE=186/NIND=14/INDI=956/NSDT=0

Page 18: Stefan Wallin Ph D Presentation : Rethinking Network Management

18

Confusing Alarm Severity

Original Severityfrom Device

Priority set byOperator

Page 19: Stefan Wallin Ph D Presentation : Rethinking Network Management

19

Hard-to ManageSeverity Distribution

Hollifield, B., Habibi, E.: The Alarm Management Handbook

Page 20: Stefan Wallin Ph D Presentation : Rethinking Network Management

20

Alarm Type Distribution

26

90%

…3500

Page 21: Stefan Wallin Ph D Presentation : Rethinking Network Management

21

Alarm Monitoring

Domain-Specific Models

Modeling Alarms – Enable Automation and Increase Quality

Page 22: Stefan Wallin Ph D Presentation : Rethinking Network Management

22

Research Structure

Sta

tus C

alcu

latio

n

Configura

tion

Changes

Service Type

Service TypeComponent

Device Type

Service Models

Constraints

Alarm Type

Alarm TypeCausa

lity

Alarm Models

Constraints

Data-Mining

Self-Learning

Page 23: Stefan Wallin Ph D Presentation : Rethinking Network Management

23

Alarms Today

We have: Alarm interface standards Envelope, the parameters Alarm documentation

Informal documents for humans

What we do not have: Formal alarm definitions that can be used for

automation The contents of the envelope “Alarm Model”

?

Page 24: Stefan Wallin Ph D Presentation : Rethinking Network Management

24

Alarm Model

BASSAlarm TypesPredicatesConstraints- Information- Semantic

Page 25: Stefan Wallin Ph D Presentation : Rethinking Network Management

25

BASS

Page 26: Stefan Wallin Ph D Presentation : Rethinking Network Management

26

Alarm DBfrom Real Operator

Bass Prototype and Validation

Alarm Docfrom Real Vendor Alarms

Uncorrelated

.alarm

Correlation RulesBASS

Feedback

DocumentationGraphs

Information ConstraintsSemantic Constraints

Correlated

Page 27: Stefan Wallin Ph D Presentation : Rethinking Network Management

27

Semantic Constraints

173 warnings in approved and released alarm interface

Page 28: Stefan Wallin Ph D Presentation : Rethinking Network Management

28

Information Constraints to Automate Correlation

Automatic identification of root-cause candidates

Page 29: Stefan Wallin Ph D Presentation : Rethinking Network Management

29

Alarm Monitoring

Data-Mining and Self-Learning

Assigning Correct Severity Levels by Learning from Experts

Page 30: Stefan Wallin Ph D Presentation : Rethinking Network Management

30

Research Structure

Sta

tus C

alcu

latio

n

Configura

tion

Changes

Service Type

Service TypeComponent

Device Type

Service Models

Constraints

Alarm Type

Alarm TypeCausa

lity

Alarm Models

Constraints

Data-Mining

Self-Learning

Page 31: Stefan Wallin Ph D Presentation : Rethinking Network Management

31

Learning Alarm Priorities

Assign

PrioAnal

yse

AlarmSystem

Neural NetworkAlarm Prio

Trouble TicketSystem

Training

SuggestPriority

DatabasesFrom RealServiceProvider

Priority

Page 32: Stefan Wallin Ph D Presentation : Rethinking Network Management

32

Result

• Neural network

correct in 53 %

• Original severity correct in 11 %

Distribution of Errors

Originalseverity

Neuralnetwork

Magnitude of ErrorToo high Too low

Perc

en

tage o

f A

larm

s

Page 33: Stefan Wallin Ph D Presentation : Rethinking Network Management

33

The Service Management

Problems

Page 34: Stefan Wallin Ph D Presentation : Rethinking Network Management

34

Service Management

”Services are not currently managed well in any suite of applications and require a tremendous amount of work to maintain”

”Service models are becoming more and more important”

”Focus on service management - bringing this up to 40% from [the] current level of 5-10%”

”Managing services must be the focus of the future development, while pushing network management into a supporting role”

Page 35: Stefan Wallin Ph D Presentation : Rethinking Network Management

35

Complex Structures

SoftwareImplementation

Interpretations and Tedious Mappings

“Service Models” Configuration

Monitoring

Page 36: Stefan Wallin Ph D Presentation : Rethinking Network Management

Solutions

Service Modeling and Service Status Calculation

Page 37: Stefan Wallin Ph D Presentation : Rethinking Network Management

37

Research Structure

Sta

tus C

alcu

latio

n

Configura

tion

Changes

Service Type

Service TypeComponent

Device Type

Service Models

Constraints

Alarm Type

Alarm TypeCausa

lity

Alarm Models

Constraints

Data-Mining

Self-Learning

Page 38: Stefan Wallin Ph D Presentation : Rethinking Network Management

38

My Two Tracks for Service Management

IETF YA

NG

SA

Lmon

Sta

tus C

alcu

latio

n

Configura

tion

Changes

Service Type

Service TypeComponent

Device Type

1 Model the Services2 Express the transformations

Page 39: Stefan Wallin Ph D Presentation : Rethinking Network Management

39

Simplifed Structures

Remove room for interpretations and automate mappings

Models

Configuration

Monitoring

Models

Page 40: Stefan Wallin Ph D Presentation : Rethinking Network Management

40

SALmon Example

BroadbandForum TR-126Triple PlayQoE Requirements

Page 41: Stefan Wallin Ph D Presentation : Rethinking Network Management

41

SALmon Test

• The TR-126model could beexecuted

• Compact complete model

• Easy to change in one place

SLA and Servicemonitor UI

Page 42: Stefan Wallin Ph D Presentation : Rethinking Network Management

42

My Two Tracks for Service Management

IETF YA

NG

SA

Lmon

Sta

tus C

alcu

latio

n

Configura

tion

Changes

Service Type

Service TypeComponent

Device Type

1 Model the Services2 Express the transformations

Released2010

Page 43: Stefan Wallin Ph D Presentation : Rethinking Network Management

43

Service Configuration and Activation

IETF Defined YANG as data-modeling language for managing devices “Replacing SNMP MIBs”

Thesis: YANG can be used to model services, not only devices Service Configuration as a YANG – YANG transform

Work: Service Modeling projects at service providers Service Activation product, Tail-f NCS

Page 44: Stefan Wallin Ph D Presentation : Rethinking Network Management

44

SALmon and YANG

SALmon IETF YANG Comment

Model Structure

Object Oriented Tree Tree structures more suited for rendering

Purpose Operational Data Configuration Data and Operational Data

Time-Series

Calculations

Functional - - YANG to YANG mapping in Java for imperative configuration- XPATH possible to express aggregation

Constraints

- XPATH

Page 45: Stefan Wallin Ph D Presentation : Rethinking Network Management

Conclusions and Future Work

Page 46: Stefan Wallin Ph D Presentation : Rethinking Network Management

46

Conclusions

For Research Closer cooperation with equipment and service providers Network management is in need of computer science

For Network Equipment Providers Provide models (in a form) that can be used for

automation Interface quality

For Service Providers Model the offered services Knowledge management

Overcome current practice of incomplete illustrations and free-form documents

Page 47: Stefan Wallin Ph D Presentation : Rethinking Network Management

47

Future Work

SALmon features represented in YANG Language extensions or as models Time-series Functional calculations

XPATH

Database representation

Imperative activation as part of the model ?

More knowledge management by usingdata-mining and self-learning

Alarm Type

Alarm Type

Cau

salit

y

Alarm Models

Constraints

Sta

tus C

alcu

latio

n

Configura

tion C

hang

es Service

Type

Service TypeComponentDevice Type

Service Models

Constraints

Data-Mining

Self-Learning

Data-Mining

Self-Learning

Page 48: Stefan Wallin Ph D Presentation : Rethinking Network Management

48

Errata

Paper C : Says trivial approach is correct in 17 % of the

cases Should be 11 %

Section 2 : Wrong “T”, should be:

Page 49: Stefan Wallin Ph D Presentation : Rethinking Network Management

49

Thank You !

Mikael Börjesson

Jörgen ÖfjellJohan EhnmarkAndreas JonssonUlrik ForsgrenMagnus KarlssonLeif Landén

Christer ÅhlundJohan NordlanderViktor LeijonRobert BrännströmKarl AnderssonDaniel GranlundDan Johansson

Klacke WikströmHåkan MillrothMartin BjörklundSeb StrolloJohan BevemyrJoakim GrebenöChris Williams

Equipment Vendors and Service ProvidersTest Data

Nicklas Bystedt

Sidath HandurukandeEU FundedMagneto Project

Page 50: Stefan Wallin Ph D Presentation : Rethinking Network Management

50

?