RENATER Network management approach - GARR Network management approach Frederic LOUI ... • All the...

90
1 RENATER Network management approach Frederic LOUI François-Xavier ANDREU Network Backbone Operation & Engineering Rome, June 22 nd –25 th 2009 [email protected] [email protected]

Transcript of RENATER Network management approach - GARR Network management approach Frederic LOUI ... • All the...

1

RENATER Network management approach

Frederic LOUIFrançois-Xavier ANDREUNetwork Backbone Operation & Engineering

Rome, June 22nd–25th 2009

[email protected]@renater.fr

2

Housekeeping

• We value your feedback - don't forget to complete your training session evaluations

• Please switch off the bell of your mobile phones

• All the slides will be made available• Don’t hesitate to ask questions ☺

3

Round table

• Who are you ?• About me ☺• Do you have any specific expectations

regarding this training course ?

4

Training course objective

• Expose RENATER network management approach

• Management approach based on several constraints (Legacy, technical, organizational)

• Case study and hands-on • MAIN GOAL is to provide you enough

material, so that YOU can start/adjust your own network management approach

5

Agenda

• Few words about RENATER• Basic network management concepts• Approach based on several constraints• Data sources• Network Management Tools• Case studies • Hands-on on tools

6

RENATER big picture

• French NREN (National Reseach and Education Network)• Geographic scope:

• 58 PoPs (at least one per region) + overseas territories PoPs• +800 sites connected

• Connectivity:• Generally, sites are connected via MAN or regional networks• Few sites are directly connected (mainly Universities)

• Additionnal services:• LIR: Local Internet Registry for Education & research

community• SFINX: Service for French Internet Exchange

7

RENATER big picture

• RENATER network version: fifth iteration• Optical networking

• Newly “owned” Dark fiber infrastructure• Links = n*10Gbps• Lightpath for dedicated research project• CIENA DWDM equipment

• Layer 2 switching• C6500, C4500, C3750 for L2

• Layer 3 routing• CRS-1, 12K, 7609, 7200 for L3• Powered by IOS and IOS-XR

8

RENATER big picture

• What network services do we provide ?• Basic IP

• IPv4 unicast / multicast• IPv6 unicast / multicast

• VPN services• L3VPN Aka “MPLS-VPN”• L2VPN (VPWS/802.1q)

• Additional network services• IP telephony• SSO (Single Sign On)• Anti-spam

• And in a near future…• 6VPE• MVPN• VPLS

9

Agenda

• Few words about RENATER• Basic network management concepts• Approach based on several constraints• Data sources• Network Management Tools• Case studies• Hands-on on tools

10

Network Management concepts

• Monitoring- Ease and improve supervision of the Network. - Get diagnostics tools to help us analyse the network behaviour

• Planning- Optimize architecture- Ensure that the network scales well in terms of capacity- Establish a clear trends forecast

• Security- Increase network security by detecting them pro-actively and quantify their effects

• Accounting / BillingMake sure a good correlation between cost, SLA and effective usage of the network

• Performance- Make sure that the network is ready and isbehaviour as expected depending on the service level agreement

• …

• Network Management ? For what ?

11

Network Management concepts

• TYPICAL TELECOM ORGANIZATION STRUCTURE• Network Operation Control department

• Customer Level 1 SPOC• Customer Level 2 SPOC (Escalate issue to expert relying in engineering team)

• Planning department• Monitor network backbone and customer link usage• Drive network backbone evolution in terms of bandwidth capacity

• Provisioning / Configuration• Configure the SP equipement as per the change management process• Ensure customer’s network “Life Cycle management”

• Engineering department• Traffic Priority/Congestion management• Flappy error condition

• Product marketing department• Inspect the market share according to customer needs (Business Case)• Study new services in terms of financial cost & revenue

• Commercial department• Promote key product• Elaborate price list

• Billing department• Ensure that the billing is timely accurate

12

Network Management concepts

• TYPICAL TELECOM ORGANIZATION STRUCTURE

13

Network Management concepts

• FCAPS, anybody heard about that ?• Fault

• Pro-active fault detection• Alarm management

• Configuration• Change management process• Configuration management

• Accounting• Per customer capacity planning• Per link capacity planning

• Performance• Traffic Priority/Congestion management• Flappy error condition

• Security• Single Sign on• Centralized AAA policy

14

Agenda

• Few words about RENATER• Basic network management concepts• Approach based on several constraints• Data sources• Network Management Tools• Case studies• Hands-on on tools

15

Constraint

• As a NREN we’re a Service provider• Small TELECOM company

• Previous topics can be applied to our case

• Customer are:• Education & research organizations• Universities• European project spread across several countries

16

Constraint

• RENATER organization• 30 people in RENATER (6 technical people

dedicated to related network “at large”)• Management of a national backbone is a huge

task• Needs of 24hx24 / 7x7 duty coverage• Needs of huge Network Management

Infrastructure• At minimum, more than 15~20 people are

needed to run such network

17

Constraint

• Outsourced NOC• 10 dedicated staff

• With 24hx24 / 7x7 duty coverage• Guaranty to have “skilled” staff

• Mutualized Network Management• NMS already deployed (Pollers, SNMP trap servers)• Web portal, Trouble ticket system• Etc.

18

Constraint

• But …“On est jamais mieux servi que par soi même”

• NOC needs to be closely followed• NMS not always accurate• Sometimes NOC perspective is different than our perspective of the

network• Staff renewal

• After all, it is still needed to deploy our own Network Management System

• Control/Check the network behavior in details• Provide detailed report on network usage• Ensure light NOC function• A full automated network management suite is not required

• In house / Home made tools

19

Agenda

• Few words about RENATER• Basic network management concepts• Approach based on several constraints• Data sources• Network Management Tools• Case studies• Hands-on on tools

20

Data sources

• Passive measurements• SNMP• Netflow• (Command line interface)• (XML)

• Active measurement• Home made beacons• RIPE TTM• Symetricomm RENATER’s choice

21

Passive measurements: SNMP

• MIB : Tree organized database residing insidethe equipment that can be interrogated throughthe use of Simple Network Management Protocol (SNMP)

• Opensource software available: MRTG, Cricket, RTG, CACTI...

• For HACKERS, C library: UCD-SNMP(now Net-SNMP)

• Different SNMP version:• v1 : Get, GetNext, Set• v2 : Get, GetNext, GetBulk, Set, Inform• v3 : Added security features and administration

22

Passive measurements: SNMP

• Interfaces load:• In Mb/s or Packet/s• Different available reports:

• 24 h graph report• Weekly• Half a year report• yearly

• CPU load:

23

Passive measurements: SNMP

• MIB v6• MIB MPLS • SNMP collection mode limits:

• Graph report are averaged (Average over 5 minutes)• Provide only IP report (IPv4 + IPv6)• No information above layer 4• Difficulty to define alarms upon a traffic pattern behaviour:

24

Passive measurements: SNMP

Deny Of Service exemple

25

NetFlow

• Flow definition:

Flow charateristics:- Source IP

- Destination IP

- Source port

- Destination port- Protocol

- Type of Service

- interface SNMP index

+

Number of packet of the flow

Number of bytes of the flow

Time: start and stop of the flow

Outgoing interface SNMP index

Source and Destination AS

Source and Destination subnetmask

Cummulative TCP flags

Netflow collector

User desktopMail server

26

TCP/IP Headers

0

Source IP Address

Identification

3115 16

Destination IP Address

Source Port Number Destination Port NumberSequence Number

Time to Live

Total Lengthflags Fragment Offset

Header Checksum

Version HLEN ToS

Urgent Pointer

Protocol

Acknowledgement NumberHeader Reserved Window SizeTCP Flags

TCP Checksum

IP Header

TCP Header

27

NetFlow

• Architecture example:

• Some figures :• A netflow traffic rate estimlated at 60 Mb/s during the day,

3Mb/s in the night toward the netflow collector• 20 millions of flux / 5 minutes during business hour, a

minimum of 2 millions over night (Pay attention, 80% of the equipments are running netflow in sample mode)

28

NetFlow• Web server connection example:

• backup of information of interest related to:• A router• An interface• An Autonomous System• A network prefix (Ex: Traffic rate related to customer netbloc)• Some well known ports

Adresse sourceAdresse destination Routeur

index d'entrée

index de sortie

Port source

Port dest. Prot Octets Paquets

AS source

AS dest.

193.49.159.141 194.199.8.10 193.51.179.66 16 14 1164 80 6 821 10 0 65037194.199.8.10 193.49.159.141 193.51.179.66 14 16 80 1164 6 14456 14 65037 0193.49.159.141 194.199.8.10 193.51.179.66 16 14 1165 80 6 544 7 0 65037194.199.8.10 193.49.159.141 193.51.179.66 14 16 80 1165 6 6582 7 65037 0193.49.159.141 194.199.8.10 193.51.179.66 16 14 1166 80 6 552 7 0 65037194.199.8.10 193.49.159.141 193.51.179.66 14 16 80 1166 6 6641 7 65037 0193.49.159.141 194.199.8.10 193.51.179.66 16 14 1167 80 6 551 7 0 65037194.199.8.10 193.49.159.141 193.51.179.66 14 16 80 1167 6 6895 8 65037 0193.49.159.141 194.199.8.10 193.51.179.66 16 14 1168 80 6 755 12 0 65037194.199.8.10 193.49.159.141 193.51.179.66 14 16 80 1168 6 20203 17 65037 0193.49.159.141 194.199.8.10 193.51.179.66 16 14 1169 80 6 460 5 0 65037

29

NetFlow (prefix information)

NetFlow information contained in the database are available through a html/php interface, here are some screenshots:

Traffic corresponding to a specific IP address block can be visualized (either in number of flows per second – see above example – or per bit/s – see on the left example -).There are about 4000 IP addresses blocks for the RENATER network (such addresses being of course allocated to RENATER sites).

NR = nœud RENATER

30

Netflow information per customer

Volume of traffic by class of services (IPv4)

Daily and monthly IPv4 traffic volume

31

NetFlow• Distribution of traffic by ports

• Alarm based on routing issue

• Traffic matrix

• Flow capture based on specific traffic pattern or signature at the collectorlevel (ports, number of paquets, sizes,…). A report is generated everydaycorresponding to the top 40 of addresses having suspected"suspected« traffic (P2P, ftp Warez). These reports are transferred to the CERT-RENATER so as to handle security issue.

32

NetFlow

Protocoles

icmp2%

tcp84%

udp14%

others0%

icmptcpudpothers

Distribution by protocol

Répartition des flux et des octets pour le trafic du NR de Lyon

0 10 20 30 40 50

0

20

22

25

53

80

137

445

4662

5020

autres

OctetsFlux

Distribution by port

33

NetFlow

• DoS detection :• When threshold is crossed:

• NetFlow report extract:

Adresse sourceAdresse destination Routeur

index d'entrée

index de sortie

Port source

Port dest. Prot Octets Paquets

AS source

AS dest.

194.57.222.66 163.15.163.247 193.51.177.42 5 3 1445 80 6 40 1 1715 7539194.57.222.211 163.15.163.247 193.51.177.42 5 3 1414 63 6 40 1 1715 7539194.57.222.170 163.15.163.247 193.51.177.42 5 3 1191 53 6 40 1 1715 7539194.57.222.190 163.15.163.247 193.51.177.42 5 3 1232 34 6 40 1 1715 7539194.57.222.18 163.15.163.247 193.51.177.42 5 3 1610 25 6 40 1 1715 7539194.57.222.10 163.15.163.247 193.51.177.42 5 3 1582 23 6 40 1 1715 7539194.57.222.139 163.15.163.247 193.51.177.42 5 3 1103 1 6 40 1 1715 7539194.57.222.203 163.15.163.247 193.51.177.42 5 3 1590 116 6 40 1 1715 7539194.57.222.116 163.15.163.247 193.51.177.42 5 3 1877 113 6 40 1 1715 7539194.57.222.34 163.15.163.247 193.51.177.42 5 3 1566 98 6 40 1 1715 7539194.57.222.166 163.15.163.247 193.51.177.42 5 3 1122 90 6 40 1 1715 7539194.57.222.1 163.15.163.247 193.51.177.42 5 3 1975 86 6 40 1 1715 7539194.57.222.182 163.15.163.247 193.51.177.42 5 3 1248 82 6 40 1 1715 7539194.57.222.163 163.15.163.247 193.51.177.42 5 3 1696 70 6 40 1 1715 7539194.57.222.6 163.15.163.247 193.51.177.42 5 3 1270 62 6 40 1 1715 7539

34

• Additional problem :• The more the traffic gets higher Netflow

processing at equipment level becomes a problem

• One answer:Sampling (Sampled Netflow):

Within RENATER: 10% packets (~1/2 des flux)

NetFlow

35

NetFlow

• New transport formats (v9): • Use of "templates"• New protocol taken into account

• IPv6• Multicast • MPLS

• Netflow Egress vs Ingress• Different sampling mode :

• Determinist• Random

36

NetFlow sampling mode comparison« Full » et « Sampled »

90%530355248802075Amount of packets

90%207819025019344774677Amount of bytes

55%15223383416043Number of flows

LossSampledFull

54%14875223204712Number of TCP flows

84%34713211089Number of UDP flows

58%103241Number of ICMP flows

55%15223383416043Number of flows

PertesSampledFull

37

The attack can be visualized among all the flows that go through RENATER backbone. The number of flows increases at 12:30. It's not an usual traffic behaviour.

It seems that the attack comes from a source IP address that do not belong to a RENATER site. However, the flows are showing this because the origin IP address has been spoofed (usurpation of IP address). IP source address is spoofed but destination IP address is real and this destination is located in an ISP network.

Information about such flows can be found using the tool logs :

- destination address is always the same.

- source address is different (but in the same block)

- router IP address

- different source port

- same destination port

DoS attack

These traces come from NetFlow tool.

source addressdestination address Router

index IN

index OUT

source port

dest. Port Prot

171.24.11.213 217.172.184.27 193.51.177.35 5 2 1142 8767 6195.110.78.31 217.172.184.27 193.51.177.35 5 2 1885 8767 620.117.44.79 217.172.184.27 193.51.177.35 5 2 1185 8767 6202.140.234.35 217.172.184.27 193.51.177.35 5 2 1108 8767 6142.242.37.16 217.172.184.27 193.51.177.35 5 2 1784 8767 6131.128.177.4 217.172.184.27 193.51.177.35 5 2 1966 8767 661.30.170.221 217.172.184.27 193.51.177.35 5 2 1715 8767 6219.218.36.159 217.172.184.27 193.51.177.35 5 2 1746 8767 631.129.210.2 217.172.184.27 193.51.177.35 5 2 1672 8767 623.124.245.196 217.172.184.27 193.51.177.35 5 2 1960 8767 6106.250.168.39 217.172.184.27 193.51.177.35 5 2 1285 8767 6181.25.228.4 217.172.184.27 193.51.177.35 5 2 1058 8767 6159.225.242.122 217.172.184.27 193.51.177.35 5 2 1274 8767 6167.166.50.239 217.172.184.27 193.51.177.35 5 2 1809 8767 62.104.106.121 217.172.184.27 193.51.177.35 5 2 1729 8767 682.210.101.233 217.172.184.27 193.51.177.35 5 2 1162 8767 6203.20.11.217 217.172.184.27 193.51.177.35 5 2 1397 8767 6210.153.129.121 217.172.184.27 193.51.177.35 5 2 1632 8767 6220.206.215.31 217.172.184.27 193.51.177.35 5 2 1644 8767 6182.20.179.145 217.172.184.27 193.51.177.35 5 2 1079 8767 6183.48.240.70 217.172.184.27 193.51.177.35 5 2 1920 8767 6189.192.51.92 217.172.184.27 193.51.177.35 5 2 1493 8767 6151.16.75.43 217.172.184.27 193.51.177.35 5 2 1573 8767 6

Deny of service attack traces and investigation:

39

LimogesPOP

SfinxPOP

BACKBONEBACKBONE

CollectNetwork

ISP

• This attack can also be seen through a SNMP tool :- increasing of traffic : + 40 Mbit/s - on each network interface that the flows went

through- but we don't have flow level view, and so no

access to IP address.- After detection, information are sent to RENATER

Computer Emergency Response Team (CERT)

12

3

NetFlow router of previous slide

2:

3:

1:

DoS observation with NetFlow and SNMP

40

Active measurements

• Active measurement agenda• Basic concepts• Metrics• Application requirements• Issues related to active measurements• Active measurement within a high speed backbone• Some existing active measurements alternatives

41

Basic concept

• Various delay definition : • Propagation delay

• Time for the signal to propagate through the physical media• Function of distance and light celerity, 0.1-0.2 second round the

globe• Transmission delay

• Link rate / Size of the paquet• Queuing delay

• Number of packets * link rate / paquet size

• Packet losses due to : • Transmit queue is full• Link noise (inexistant today except for wireless technology) • Re-routing too long (Cf case study #6)

42

Metrics (1/2)

• IETF standardization :• IPPM (IP Performance Groups)• Delay, Jitter, Packets, unsequence re-ordering...

• For each class of service and IP protocoltype

• Focus also put on time precision• End to end measurement but also

segmented measurement:

43

Metrics (2/2)• Unidirectional Delay (cf RFC 2679) :

• Significant for real-time application (VoIP, …)• Quantify the quality perception of the user (ex : TCP quality application depends on the

delay of the packets arrival and not on TCP acknowledgement) → More representative thanthe RTT

• Can be influenced by an efficient class of service implementation taking into account direct and return path separatly

• Jitter (Unidirectinoal delay variation, cf RFC 3393) :• Mostly induced by equipement transmit queues• Tune buffer size correctly for streaming application• Denote the dynamic and stability of the network

• Unidirectionnel packet loss (cf RFC 2680) :• Significant for all type of application• Mosty due to network congestion

• Reordeing (cf draft-ietf-ippm-reordering) :• 1 paquet non ordered = paquet with delay (cf IPPM) • Significant for streming application (paquet with too much delay packet dropped)• Mostly due to ECMP or non ECMP in the network, or protocol retransmission upon error

raised

44

Application requirements:

Source : http://www.itu.int/osg/spu/wtpf/wtpf2001/infosession/pettitt1.pdf

45

Issues

• Precision• Delay ≈ Few ms

Precision ≈ 100 μs

• Synchronization :• NTP :

• WAN > 1 ms• Instability

• GPS : • 10 µs• Not cost effective due to installation cost

• End to end measurement :• One beacon on each end• Troubleshooting issue with end to end PC

Unidirectional delay between 2 stations A and B directly connected, B isNTP server for A, B synchronize its clock on locale hardware clock.

46

Issues : OS

• Measurement station : real time and precise supervision → Hardware stability ≈ 100 μsCommon OS are not real time OS Latency could be 10s ms long

Example : Delay between 2 computers connected by a cable

47

Issues : OS (2)

Answers :

• Use a real time OS (QNX, RTLinux, …) Huge development effort

• Apply a patch to improve OS (lowlatency or preempt kernel for linux for instance) → simple but less efficient

• Post-process the data result so as to compensate the imprecision due to the OS Not easy task …

48

Issues : other criteria

• Location : Ideally a beacon in each POP would be optimal so as to test all combination of end to end path

• Measurement coordination and centralization for post processing

• Reliability : do not trigger false alarm• Security : Avoid measurement falsification, intrusion and

DOS• Representative measurements : Tune test so as tobe in

user conditions Qualify the customer caracteristicsand apply it to test traffic (size, DSCP …)

• Exhaustivity : Multicast and IPv6 measurement

49

Some figures in RENATER backbone

• Delay ≈ few ms• Jitter ≈ ms• Loss << 10-3• Re-ordering very low

6

4,19,05

33,4

9,5

11,8

11,5

7,76,8

12,35

6,6110,02

50

Active measurements@ RENATER

• Several boxes placedon strategical PoPs

• GPS syncronization• ~microsec accuracy

• IPPM metricssupported

51

Active measurements@ RENATER

52

Active measurements@ RENATER

53

Active measurement dashboard

http://pasillo.renater.fr/metrologie/get_qosmetrics_results.php

54

Agenda

• Few words about RENATER• Basic network management concepts• Approach based on several constraints• Data sources• Network Management Tools• Case studies• Hands-on on tools

55

RENATER management

• Public tools• Network health:

• WeatherMap• Active measurements

• Looking Glass• Private tools

• Traffic per Site (Netflow)• IGP consistency

56

IP weathermap

57

IPv6 weathermap

58

PARIS weathermap

59

Overseas territories

60

IGP consistency

61

Additional Weathermapexample

62

63

64

Déploiement de sondes de mesures actives sur RENATER

http://pasillo.renater.fr/metrologie/get_qosmetrics_results.php

65

Looking Glass

• Get information on a router w/o directconnection

• Web Interface• Final user don’t need a login• Allows the user to detect causes of

failures w/o asking the NOC or netadmin

66

Looking Glass

67

Traffic per site

• Sites aren’t directly connected to RENATER PoPs• Netflow technology permits to have traffic per

site• Interaction between Information System

(SAGA)• Demo

68

Internal Web portal

69

AS aggregation study

70

Agenda

• Few words about RENATER• Basic network management concepts• Approach based on several constraints• Data sources• Network Management Tools• Case studies• Hands-on on tools

71

Case study #1

• Graph interpretation:

72

Case study #2

• Get the next 5 minutes after the peaks

• Compute the average• You get the ususal bit

bit rate• One polling took more

time and a significantamount of traffic has been taken into accountin the previous 5 min

• The next value over 5 min has in turn a lower value

73

Case study #3 (start)

• Architecture :

4 x E1 (Toward Metropolitan France)

POP CAYENNE (GUYANE CENTRAL US)French Overseas territory with IPSLA probes

activated

74

Case study #3

75

Case study #3 (end)

76

Case study #6 re-routing

• Delay:

• Jitter:

• Hop count:

• Packet loss:• 1 Pkt/10ms• 400 Pkt drops 4s

77

Case study #7

• Rerouting• Packet loss

Visualization withActive measurementMatrix

78

Case study #8

• Link DOWN• Rerouting at layer 3

• Alternate path longer• Hop count greater• Delay shorter

79

Case study #9 « Bufferization»

• SNMP Statistics

• Active measurements:• 100 Pkt/s

80

Case study #10

• Tools reliability

81

Case study #11 (1/3)

82

Case study #11 (2/3)

83

Case study #11 (3/3)

84

C & WParis1 Paris2 Lyon2 MontpellierToulouse REMIP2000 Toulouse Montpellier Lyon2 Lyon1 Level3

85

Agenda

• Few words about RENATER• Basic network management concepts• Approach based on several constraints• Data sources• Network Management Tools• Case studies• Hands-on on tools

86

Hands on tools

• yaNMP• Yet Another Network Management Platform• Provide a big picture if your network• Fault management and capacity planning• Can be coupled with any data source• Simple way to show up how your network

looks like and keep track of its evolution

87

yaNMP

• yaNMP context• RENATER-5 deployment

• New platform Different configuration type• Different type of equipment• New OS IOS, IOS-XR, IOS-XE• Maybe one day JUNOS ? ☺• Links upgrade and new links Topology change

• Help !!!!• How to reflect the Network status on a day to day basis ?• How to keep up with the new deployment pace ?• Using existing weathermap is possible but the process is not

so intuitive

88

yaNMP

• yaNMP intrinsics• yaNMP-GUI

• Java based GUI• “Should” be multi-platform • Start yaNMP in interactive mode !

• yaNMP-DAEMON• Java based• “Should” be multi-platform• Start yaNMP in NON interactive mode !

89

yaNMP

• yaNMP input files• Nodes file

• Node identifier• X position• Y position

• Links file• Half link identifier• Link value

• Links status URL

90

yaNMP

• Hands-on• Hands-on objective: depict your network

• Provide a physical view of your network• Provide a logical view of network• Reflect your network state• Make your weathermap reflect the evolution of your network

• Context• Each NREN will have its own geographical map background• Use it to create your physical network• Build then your Layer 3 weathermap

• Scenario• Your engineering team has deployed a new link between 2 of your

existing POP• Adjust your physical weathermap• Adjust your logical weathermap