Protection

70
Automatic Automatic Protection Protection Switching Switching Yaakov (J) Stein Chief Scientist RAD Data Communications April 2009

Transcript of Protection

Page 1: Protection

AutomaticAutomaticProtection Protection SwitchingSwitching

Yaakov (J) Stein Chief ScientistRAD Data Communications

April 2009

Page 2: Protection

Y(J)S APS Slide 2

Course OutlineCourse Outline• General protection switching principles

• Examples of protection mechanisms

• SONET/SDH• Ethernet linear protection• Ethernet ring protection• MPLS fast reroute

Page 3: Protection

Y(J)S APS Slide 3

General principlesGeneral principles

DefinitionReferencesTraffic typesNetwork topologiesTriggersProtection classesEntitiesProtection typesSignaling

Page 4: Protection

Y(J)S APS Slide 4

DefinitionDefinitionAutomatic Protection Switching (APS)

is a functionality of carrier-grade transport networksis often called resilience

since it enables service to quickly recover from failuresis required to ensure high reliability and availability

APS includes : detection of failures (signal fail or signal degrade) on a working channel switching traffic transmission to a protection channel selecting traffic reception from the protection channel (optionally) reverting back to the working channel once failure is repaired

Automatic means uses (at most) control plane protocols– no management layer or manual operations needed

Page 5: Protection

Y(J)S APS Slide 5

Some useful referencesSome useful references

G.808.1 – generic linear protectionG.808.2 – generic ring protection (not yet written)G.841 and G.842 – SDHG.774.3/4/9/10 – SDH protection managementG.870 and G.873.1 – OTN G.8031 – Ethernet linear protectionG.8032 – Ethernet ring protectionG.8131 – T-MPLS APSY.1720 – MPLS I.630 – ATMM.495 – analog signal protectionG.781 – clock selection (can be used to protect synchornization)RFC 4090 – MPLS fast reroute

Page 6: Protection

Y(J)S APS Slide 6

Traffic typesTraffic typesIn a network with APS capabilities, there are three types of traffic :

protected traffic– traffic that may be rapidly switched to protection channel– at any time it may be on the working channel or protection channel

Nonpreemptible Unprotected Traffic (NUT)– noncritical traffic that does not require protection mechanism– not affected by protection mechanism– somewhat less expensive to customer

extra (preemptible) traffic– best effort background traffic that runs on protection channel– preempted (blocked) when protection channel is needed– very inexpensive to customer

Page 7: Protection

Y(J)S APS Slide 7

Network topologiesNetwork topologiesAPS can be defined for any topology with redundant links

e.g., for tree topologies no protection is possible

We will often discuss protection of individual links

However, there are two topologies that are of particular interest :

rings– protection is natural for rings

although there are other reasons for using rings as well– rings are so important that protection for other topologies

is often called linear protection

dense meshes– for this topology multiple local bypasses can be preconfigured– protection switching is similar to routing change, but faster

often called “fast reroute” (FRR)

Page 8: Protection

Y(J)S APS Slide 8

TriggersTriggers

Protection switching is usually triggered by a failurealthough the operator may manually force a protection switch

A failure is declared when a fault condition persists long enough

for the ability to perform the required functionto be considered terminated

Failures are Signal Fail (SF) or Signal Degrade (SD) (of various types)and may be : detected by physical layer indicated by signaling (e.g. AIS) detected by OAM mechanisms

When there is no SF or SD, the state is called No Request (NR)

Page 9: Protection

Y(J)S APS Slide 9

Switching time Switching time (1)(1)

SONET/SDH protection switching takes place in under 50 msRegarding multiplex section shared protection rings, G.841 states :The following network objectives apply:

1) Switch time – In a ring with no extra traffic, all nodes in the idle state (no detected failures,no active automatic or external commands, and receiving only Idle K-bytes), and with lessthan 1200 km of fibre, the switch (ring and span) completion time for a failure on a singlespan shall be less than 50 ms. On rings under all other conditions, the switch completiontime can exceed 50 ms (the specific interval is under study) to allow time to remove extratraffic, or to negotiate and accommodate coexisting APS requests.

while for linear VC trail protection, it says :The following network objectives apply:

1) Switch time – The APS algorithm for LO/HO VC trail protection shall operate as fast aspossible. A value of 50 ms has been proposed as a target time. Concerns have beenexpressed over this proposed target time when many VCs are involved. This is for furtherstudy. Protection switch completion time excludes the detection time necessary to initiate theprotection switch, and the hold-off time.

There are similar statements in other clauses as well

Page 10: Protection

Y(J)S APS Slide 10

Switching time Switching time (2)(2) This 50 ms time has become the golden standard

and new protection schemes are expected to meet this objectiveHowever, studying the literature that lead up to SONET/SDH standards

shows that the objective was to attain the minimum possible timefor the sum of– persistent (i.e. non-transient) failure detection– speed of light propagation– signaling protocol time– regaining sync alignment

and 50 ms was the minimum that was considered practical !Many modern standards have “built in” 50 ms

and much marketing literature boasts “faster than 50 ms”But there is really nothing special about 50 ms 50 ms gaps in voiced speech are noticeable, but not fatal if infrequent 50 ms of data at high rates can not be stored and later forwarded timing circuits can withstand much more than 50 ms without clock

Page 11: Protection

Y(J)S APS Slide 11

Protection classesProtection classes

It is useful to distinguish two different protection classes

path protection (AKA trail protection, end-to-end protection)– when a failure is detected on the end-to-end path

we switch to an alternative end-to-end path– the failure is usually detected by end-to-end OAM

local protection (AKA local restoration, SNC protection, bypass, detour)– we protect individual network elements, links, or groups of same– when such an entity fails

only that local entity is bypassed– the failure may be detected by link OAM or physical layer means

Page 12: Protection

Y(J)S APS Slide 12

APS entities APS entities (1)(1)

The following entities are important in APS working channel – channel used when no failure exists protection channel – channel used when a failure exists head-end – entity transmitting data to working/protection channel tail-end – entity receiving data from the working/protection channel

Note: we will usually consider traffic to be bidirectionalso that the head-end for one direction is the tail-end for the opposite direction

head-end tail-end

working channel

protection channel

Page 13: Protection

Y(J)S APS Slide 13

APS entities APS entities (2)(2) Bridge – function at head-end that connects traffic (including extra traffic)

to the working and protection channels Selector – function at tail-end that extracts traffic (perhaps extra traffic)

from the working or protection channel

APS signaling channel – channel used to communicate between head-end and tail-end for APS purposes

Trail termination – function responsible for failure detection

including injection and extraction of OAM

head-end

(bridge)

tail-end

(selector)

working channel

protection channel

signaling channel

Page 14: Protection

Y(J)S APS Slide 14

Revertive operationRevertive operation

Reversion means returning to use the working channel after the failure has been rectified

Protection mechanisms can be revertive or nonrevertive

Revertive mechanisms may be preferable when the working channel has better performance (free BW, BER, delay) when there are frequent switches (easier to manage) when there is extra traffic

but nonrevertive also has advantages only one service disruption due to protection switching may be simpler to implement

Page 15: Protection

Y(J)S APS Slide 15

Uni/bi-directionalUni/bi-directionalWe will usually consider bidirectional trafficbut even then the failures can be uni- or bi- directionaland for unidirectional failures there can be uni- or bi- directional switches

unidirectional failure

bidirectional failure

working channel

protection channel in useworking channel

protection channel

unidirectional protection

working channel

protection channel in useworking channel

protection channel in use

bidirectional protection

Page 16: Protection

Y(J)S APS Slide 16

Uni- / bi- directional switchingUni- / bi- directional switchingUnidirectional switching may be advantageous for 1+1 faster and no signaling channel is needed no unnecessary service disruption for direction without failure higher chance of protection under multiple failures easier to implement for local protection maintains extra traffic in direction without failure

But bidirectional may be preferable easier management since directions traverse same network elements does not disrupt delay balance between direction may simplify repair since failed spans are unused

Page 17: Protection

Y(J)S APS Slide 17

Protection typesProtection types

We distinguish several different protection types 1+1 1:1 1:n m:n (1:1)n

Each type has its applicability, advantages, and disadvantagesand there are trade-offs between simplicity BW consumption protection switch time signaling requirements

Page 18: Protection

Y(J)S APS Slide 18

1+1 protection1+1 protectionSimplest and fastest form of protection

but only 50% of actual physical capacity is used although the rest is available for extra traffic

Head-end bridge always sends data on both channelsTail-end selector chooses channel to use (based on BER, dLOS, etc.)

For unidirectional1+1 switching there is no need for APS signalingIf non-revertive

there is no distinction between working and protection channels

channel A

channel B

Page 19: Protection

Y(J)S APS Slide 19

1:1 protection1:1 protectionHead-end bridge usually sends data on working channelWhen failure detected it starts sending data over protection channel

and tail-end needs to select the protection channel

When not in use, protection channel can be used for extra traffic

However, since failure is detected by tail-end, APS signaling is needed

Protection channel should have OAM running to ensure its functionality

working channel

protection channel

extra traffic

APS signaling

Page 20: Protection

Y(J)S APS Slide 20

1:n protection1:n protection

One protection channel is allocated for n working channels

Only can protect one working channel at a time

but improbable that more than 1 working channel will simultaneously fail

Only 1/(n+1) of total capacity is reserved for protection

working channels

protection channel

Page 21: Protection

Y(J)S APS Slide 21

m:n protectionm:n protection

To enable protection of more than 1 channel

m protection channels are allocated for n working channels (m < n)

m simultaneous failures can be protected

Less protection capacity dedicated than for n times 1:1

When failure detected, 1 of the m protection channels need to be assigned and signaled

working channels

protection channels

Page 22: Protection

Y(J)S APS Slide 22

(1:1)(1:1)nn protection protectionThis is like n times 1:1 but the n protection channels share bandwidthOnly 1 failed working channel can be protected

This is different from 1:n since n protection channels are preconfigured n working channels need not be of the same type

Protection bandwidth must be at least that of the largest working channel

Page 23: Protection

Y(J)S APS Slide 23

APS algorithmAPS algorithm

We have seen that protection switching is a tricky businessSo it is not surprising that network elements that support APS

run an APS algorithm

This algorithm inputs configuration (protection type, revertive?, available channels, …) failure indications (NR, SF, SD) operator commands APS signaling (more on that soon)

and makes switching decisions

The algorithm maintains state information for head-end and tail-end

APS algorithms are detailed in standards documents

Page 24: Protection

Y(J)S APS Slide 24

PriorityPriority

Not every failure event / operator command results in a protection switch

For examplein 1:n protection the protection channel may already be in use !

Conflicts are resolved by assigning priorities to events/commands

When an event is detected or a command receivedthe APS algorithm will not act if an event/command or equal or higher priority is already in effect

True failure conditions usually have higher priority than manual commands

Page 25: Protection

Y(J)S APS Slide 25

TimersTimersEven failure events with priority are not acted upon immediately

to do so would cause unnecessary switches after transient defects

The APS algorithm may maintains several timers, such as Holdoff timers

– the time between detection of a SF or SD eventand the APS algorithm acting upon this even

– the algorithm usually used is called “peek twice”i.e., the condition is checked again after the timer expires

Wait To Restore timer– for revertive switching, the time between detection of the failure

being cleared and the APS algorithm acting upon this event– also used in SDH optimized bidirectional 1+1 (nonrevertive)

Guard timer– for rings – blockout time during which APS messages are ignored

(since they may be old and outdated)

Page 26: Protection

Y(J)S APS Slide 26

APS signalingAPS signaling

In all types except unidirectional 1+1, APS signaling is neededAPS signaling is used to synchronize between head-end and tail-endIt is critical that head-end and tail-end always be in the same state

Example messages include : No Request (NR) by tail-end to inform head-end of Signal Failure (SF) by head-end to confirm the event’s priority by head-end to report the particular protection channel by tail-end to inform head-end of Reverse (bidirectional) Request (RR) by tail-end after failure cleared to Wait To Restore (WTR) by tail-end after failure cleared to Do Not Revert (DNR) for nonrevertive

Page 27: Protection

Y(J)S APS Slide 27

APS signaling phasesAPS signaling phases

When APS signaling is used, it needs to be as rapid as possibleDepending on the scenario it may be 1-phase tailhead (fastest)

– tail-end informs head-end of failure– both ends uniquely know the protection channel to be used– only for 1+1 and unidirectional-(1:1)n (including 1:1)

2-phase 1) tailhead 2) headtail

– tail-end informs head-end of failure– head-end signals that it has switched to protection channel– not for bidirectional-1:n or m:n

3-phase 1) tailhead 2) headtail 3) tailhead (slowest)– works for all protection types (including m:n)

Page 28: Protection

Y(J)S APS Slide 28

Examples of 1-phase Examples of 1-phase

Example of when 1-phase signaling is possible is 1:1 or (1:1)n

1. upon detection of failure the tail-end sends SF to the head-endand immediately changes its selector (blind switch)upon receipt the head-end changes the bridge setting(no priority is checked)

1-phase can also be used for bidirectional 1:11. upon detection of failure the tail-end sends SF to the head-end

and immediately changes both its selector and bridgeupon receipt the head-end changes its bridge and selector

Page 29: Protection

Y(J)S APS Slide 29

Example of 2-phase Example of 2-phase

2-phase is useful for unidirectional 1:n with priority checking1. upon detection of failure the tail-end sends SF to the head-end

but does not change its selector 2. the head-end checks priority

sends confirmation to tail-end (with identity of working channel)the bridge setting is changed

3. the tail-end changes its selector

Page 30: Protection

Y(J)S APS Slide 30

Example of 3-phaseExample of 3-phase

3-phase signaling is imperative for bidirectional 1:n1. upon detection of failure the tail-end sends SF to the head-end

but does not change its selector 2. the head-end checks priority, and sends confirmation to tail-end

the bridge setting is changedand also sends a reverse request

3. the tail-end changes selectorchecks priority and sends confirmation to head-endthe bridge setting is changed

Page 31: Protection

Y(J)S APS Slide 31

For G.805 buffsFor G.805 buffsto add 1+1 trail protection to a trail - expand a trail termination functionwe use a special transport processing function - the protection switch

unprotectedtrail

the unprotected TTs report status to the protection switch

protected trail

Page 32: Protection

Y(J)S APS Slide 32

SONET/SDH APSSONET/SDH APS

Page 33: Protection

Y(J)S APS Slide 33

SONET protection ?SONET protection ?SONET/SDH networks need to be highly reliable (five nines)Down-time should be minimal (less than 50 msec)So systems must repair themselves (no time for manual intervention)

Upon detection of a failure (dLOS, dLOF, high BER)the network must reroute traffic (protection switching)from working channel to protection channel

SDH APS is unidirectionalSDH APS may be revertive

head-end NE tail-end NE

working channel

protection channel

Page 34: Protection

Y(J)S APS Slide 34

SONET/SDH layersSONET/SDH layers

Between regenerators there are sections (regenerator sections)Between ADMs there are lines (multiplex sections)Between path terminations there are paths

Protection can be at OC-n level (different physical fibers)or at STM/VC level or end-to-end path (trail protection)

PathTermination

PathTermination

LineTermination

LineTermination

SectionTermination

path

line line (MS section) line

ADM ADMregenerator

section section sectionsection

Page 35: Protection

Y(J)S APS Slide 35

Synchronous Payload Envelope

Line APSLine APS9

row

s

TransportOverhead

TOH

6 ro

ws

3 ro

ws

90 columns

9 ro

ws

TOH consists of 3 rows of section overhead - frame sync, trace, EOC, … 6 rows of line overhead - pointers, SSM, FEBE, and

Line APS signaling - bytes K1 and K2

Page 36: Protection

Y(J)S APS Slide 36

HO Path APSHO Path APS

POH is responsible for type, status, path performance monitoring, VCAT, trace

HO Path APS signaling is 4 MSBs of byte K3

J1

B3

C2

G1

F2

H4

F3

K3

N1

POH

Page 37: Protection

Y(J)S APS Slide 37

LO Path APSLO Path APS

VC OH is responsible for

Timing, PM, REI, …

LO Path APS signaling is4 MSBs of byte K4

1 875930

V5

J2

N2

K4

VC OH

V1

V2

V3

V4

Page 38: Protection

Y(J)S APS Slide 38

How does it work?How does it work?

Head-end and tail-end NEs have bridges (muxes)Head-end and tail-end NEs maintain bidirectional signaling channel

Signaling is contained in K bytes of protection channel

For line APS K1 – tail-end status and requests K2 – head-end status

head-end bridge tail-end bridgeworking channel

protection channel signaling channel

Page 39: Protection

Y(J)S APS Slide 39

Linear 1+1 protectionLinear 1+1 protection

Can be at OC-n level (different physical fibers)

or at STM/VC level (SubNetwork Connection Protection)or end-to-end path (called trail protection)

Head-end bridge always sends data on both channelsTail-end chooses channel to use based on BER, dLOS, etc.

No need for signalingIf non-revertive

there is no distinction between working and protection channels

head-end NE tail-end NE

working channel

protection channel

Page 40: Protection

Y(J)S APS Slide 40

Linear 1:1 protectionLinear 1:1 protectionHead-end bridge usually sends data on working channelWhen tail-end detects failure it signals (using K1) to head-endHead-end then starts sending data over protection channel

When not in useprotection channel can be used for (discounted) extra traffic (pre-emptible unprotected traffic)

May be at any layer (but only OC-n level protects against fiber cuts)

working channel

protection channel

extra traffic

Page 41: Protection

Y(J)S APS Slide 41

Linear 1:N protectionLinear 1:N protection

In order to save BWwe allocate 1 protection channel for every N working channels

N limited to 144 bits in K1 byte from tail-end to head-end – 0 protection channel – 1-14 working channels – 15 extra traffic channel

working channels

protection channel

Page 42: Protection

Y(J)S APS Slide 42

Two fiber vs. Four-fiber ringsTwo fiber vs. Four-fiber ringsRing based protection is popular in North America (100K+ rings)Full protection against physical fiber cutsSimpler and less expensive than mesh topologiesProtection at line (multiplexed section) or path layerFour-fiber rings

fully redundant at OC levelcan support bidirectional routing at line layer

Two-fiber ringssupport unidirectional routing at line layer

2 fibers in opposite directions

Page 43: Protection

Y(J)S APS Slide 43

Unidirectional vs. bidirectionalUnidirectional vs. bidirectionalUnidirectional routing

working channel B-A same direction (e.g. clockwise) as A-Bmanagement simplicity: A-B and B-A can occupy same timeslotsInefficient: waste in ring BW and excessive delay in one direction

Bidirectional routingA-B and B-1 are opposite in directionboth using shortest routespatial reuse: timeslots can be reused in other sections

A

BA-B

B-A

A

BB-A

A-B

C

B-C

C-B

Page 44: Protection

Y(J)S APS Slide 44

UPSR vs. BLSR UPSR vs. BLSR (MS-SPRing)(MS-SPRing)

Of all the possible combinations, only a few are in use

Unidirectional Path Switched Ringsprotects tributariesextension of 1+1 to ring topology

Bidirectional Line Switched Rings (two-fiber and four-fiber versions)called Multiplex Section Shared Protection Ring in SDHsimultaneously protects all tributaries in STMextension of 1:1 to ring topology

Path switching

Line switching

Two-fiber

Four-fiber

Unidirectional

Bidirectional

UPSR

BLSR

Page 45: Protection

Y(J)S APS Slide 45

UPSRUPSRWorking channel is in one direction

protection channel in the opposite direction

All traffic is added in both directions decision as to which to use at drop point (no signaling)

Normally non-revertive, so effective two diversity paths

Good match for access networks1 access resilient ring

less expensive than fiber pair per customer

Inefficient for core networksno spatial reuse

every signal in every spanin both directions

node needs to continuously monitorevery tributary to be dropped

Page 46: Protection

Y(J)S APS Slide 46

BLSRBLSR

Switch at line level – less monitoring

When failure detected tail-end NE signals head-end NE

Works for unidirectional/bidirectional fiber cuts, and NE failures

Two-fiber versionhalf of OC-N capacity devoted to protectiononly half capacity available for traffic

Four-fiber versionfull redundant OC-N devoted to protectiontwice as many NEs as compared to two-fiber

Examplerecovery from unidirectional fiber cut

Page 47: Protection

Y(J)S APS Slide 47

Ethernet linear APSEthernet linear APS

STP

LAG

G.8031

Page 48: Protection

Y(J)S APS Slide 48

STPSTPThe original Spanning Tree Protocol automatically removed loops

from arbitrary networks (with loops)

However, its convergence was very slow (about a minute)

STP can not be used as a protection mechanismsince its reconvergence time is very longdue to a cumbersome protocoland long holdoff timer settings

An evolutionary update called Rapid STP 802.1w was incorporated into 802.1D-2004 clause 17that converges in about the same time as STPbut can reconverge after a topology change in less than 1 second

RSTP can be used to detect failures and reconvergeand thus can be used as a primitive protection mechanism

However, the switching time will be many tens of ms to 100s of ms

Page 49: Protection

Y(J)S APS Slide 49

Use of LAGUse of LAGEthernet “link aggregation” (AKA bonding, Ethernet trunk, inverse mux, NIC teaming)

enables bonding several ports together as single uplink

Defined by 802.3ad task force and folded into 802.3-2000 as clause 43

Binding of ports to Link Aggregation Groups (LAGs) distributed via

Link Aggregation Control Protocol (LACP)

LACP uses slow protocol frames (up to 5 per second)Links may be dynamically added/removed from LAG

and LACP continuously monitors to detect if changes needed

Upon link failure LAG delivers traffic at a reduced rate

Thus LAG can be used as a primitive protection mechanism

When used this way it is called worker/standby or N+N mode

The restoration time will be on the order of 1 second

Page 50: Protection

Y(J)S APS Slide 50

G.8031G.8031Q9 of SG15 in the ITU-T is responsible for protection switching

In 2006 it produced G.8031 Linear Ethernet Protection Switching

G.8031 uses standard Ethernet formats, but is incompatible with STP

The standard addresses point-to-point VLAN connections SNC (local) protection class 1+1 and 1:1 protection types unidirectional and bidirectional switching for 1+1 bidirectional switching for 1:1 revertive and nonrevertive modes 1-phase signaling protocol

G.8031 uses Y.1731 OAM CCM messages in order to detect failures

G.8031 defines a new OAM opcode (39) for APS signaling messages

Switching times should be under 50 ms (only holdoff timers when groups)

Page 51: Protection

Y(J)S APS Slide 51

G.8031 signalingG.8031 signalingThe APS signaling message looks like this :

– regular APS messages are sent 1 per 5 seconds– after change 3 messages are sent at max rate (300 per sec)

where req/state identifies the message (NR, SF, WTR, SD, forced switch, etc) prot. type identifies the protection type (1+1, 1:1, uni/bidirectional, etc.) requested and bridged signal identify incoming / outgoing traffic

since only 1+1 and 1:1 they are either null or traffic (all other values reserved)

MEL(3b)

VER=0(5b)

OPCODE=39(1B)

FLAGS=0(1B)

OFFSET=4(1B)

req/state(4b)

prot. type(4b)

requested sig(1B)

bridged sig(1B)

reserved(1B)

END=0(1B)

Page 52: Protection

Y(J)S APS Slide 52

G.8031 1:1 revertive operationG.8031 1:1 revertive operationIn the normal (NR) state : head-end and tail-end exchange CCM (at 300 per second rate)

on both working and protection channels head-end and tail-end exchange NR APS messages

on the protection channel (every 5 seconds)

When a failure appears in the working channel tail-end stops receiving 3 CCM messages on working channel tail-end enters SF state tail-end sends 3 SF messages at 300 per second on the APS channel tail-end switches selector (bi-d and bridge) to the protection channel head-end (receiving SF) switches bridge (bi-d and selector) to protection channel tail-end continues sending SF messages every 5 seconds head-end sends NR messages but with bridged=normal

When the failure is cleared tail-end leaves SF state and enters WTR state (typically 5 minutes, 5..12 min) tail-end sends WTR message to head-end (in nonrevertive - DNR message) tail-end sends WTR every 5 seconds when WTR expires both sides enter NR state

Page 53: Protection

Y(J)S APS Slide 53

Ethernet ring APSEthernet ring APS

G.8032

RPR

CLEER

Page 54: Protection

Y(J)S APS Slide 54

Ethernet rings ?Ethernet rings ?Ethernet has become carrier grade : deterministic connection-oriented forwarding OAM synchronization

The only thing missing to completely replace SDH is ring protection

However, Ethernet and ring architectures don’t go together Ethernet has no TTL, so looped traffic will loop forever STP builds trees out of any architecture – no loops allowed

There are two ways to make an Ethernet ring open loop

– cut the ring by blocking some link– when protection is required - block the failed link

closed loop– disable STP (but avoid infinite loops in some way !)– when protection is required - steer and/or wrap traffic

Page 55: Protection

Y(J)S APS Slide 55

Ethernet ring protocolsEthernet ring protocols

Open loop methods G.8032 (ERPS) rSTP (ex 802.1w) RFER (RAD) ERP (NSN) RRST (based on RSTP) REP (Cisco) RRSTP (Alcatel) RRPP (Huawei) EAPS (Extreme, RFC 3619) EPSR (Allied Telesis) PSR (Overture)

Closed loop methods RPR (IEEE 802.17) CLEER and NERT (RAD)

Page 56: Protection

Y(J)S APS Slide 56

G.8032G.8032Q9 of SG15 produced G.8032 between 2006 and 2008

G.8032 is similar to G.8031 strives for 50 ms protection (< 1200 km, < 16 nodes)

– but here this number is deceiving as MAC table is flushed standard Ethernet format but incompatible with STP uses Y.1731 CCM for failure detection employs Y.1731 extension for R-APS signaling (opcode=40) R-APS message format similar to APS of G.8031

(but between every 2 nodes and to MAC address 01-19-A7-00-00-01) revertive and nonrevertive operation defined

However, G.8032 is more complex due to requirement to avoid loop creation under any circumstances need to localize failures need to maintain consistency between all nodes on ring existence of a special node (RPL owner)

Page 57: Protection

Y(J)S APS Slide 57

RPLRPL

G.8032 defines the Ring Protection Link (RPL)as the link to be blocked (to avoid closing the loop) in NR state

One of the 2 nodes connected to the RPL is designated the RPL owner

Unlike RFER there is only one RPL owner the RPL and owner are designated before setup operation is usually revertive

All ring nodes are simultaneously in 1 of 2 modes – idle or protecting in idle mode the RPL is blocked in protecting mode the failed link is blocked and RPL is unblocked in revertive operation

once the failure is cleared the block link is unblocked and the RPL is blocked again

Page 58: Protection

Y(J)S APS Slide 58

G.8032 revertive operationG.8032 revertive operationIn the idle state : adjacent nodes exchange CCM at 300 per second rate (including over RPL) exchange NR RB (RPL Blocked) messages in dedicated VLAN every 5 seconds

(but not over RPL) R-APS messages are never forwarded

When a failure appears between 2 nodes node(s) missing CCM messages peek twice with holdoff time node(s) block failed link and flush MAC table node(s) send SF message (3 times @ max rate, then every 5 sec) node receiving SF message will check priority and unblock any blocked link node receiving SF message will send SF message to its other neighbor in stable protecting state SF messages over every unblocked link

When the failure is cleared node(s) detect CCM and start guard timer (blocks acting on R-APS messages) node(s) send NR messages to neighbors (3 times @ max rate, then every 5 sec) RPL owner receiving NR starts WTR timer when WTR expires RPL owner blocks RPL, flushes table, and sends NR RB node receiving NR RB flushes table, unblocks any blocked ports, sends NR RB

Page 59: Protection

Y(J)S APS Slide 59

RPR – 802.17RPR – 802.17Resilient Packet Rings are compatible with standard Ethernet, but different frame format are robust (lossless, <50ms protection, OAM) are fair (based on client throttling) support QoS (3 classes – A, B, C) are efficient (full spatial reuse) are plug and play (automatic station autodiscovery) extend use of existing fiber rings

counter-rotating add/drop ringlets, running SONET/SDH (any rate, PoS, GFP or LAPS) or “packetPHY” (1 or 10 Gb/s ETH PHY)

developed by 802.17 WGbased on Cisco’s Spatial Reuse Protocol (RFC 2892)

ringlet1

ringlet0

ringlet selection

Page 60: Protection

Y(J)S APS Slide 60

Basic RPR queuingBasic RPR queuing

traffic from local sourcesent according to fairnessfirst sent to ringlet selection

PTQ

STQ

AC B

fairness

AC B

traffic going around ringplaced into internal bufferin dual-transit queue mode

placed into 1 of 2 buffersaccording to service class

sent according to fairness

traffic for local sinkplaced in output buffer

according to service class

Primary/Secondary Transit Queue

Page 61: Protection

Y(J)S APS Slide 61

RPR service classesRPR service classes

class use info rate D/FDV FE

A0 RT reserved low No

A1 RT allocated,

reclaimable

low No

B-CIR near RT allocated,

reclaimable

bounded No

B-EIR near RT opportunistic unbounded Yes

C BE opportunistic unbounded Yes

RPR defines 3 main classes class A : real time (low latency/FDV) class B : near real time (bounded predictable latency/FDV) class C : best effort

Page 62: Protection

Y(J)S APS Slide 62

RPR Class useRPR Class useA0 ring BW is reserved – not reclaimed even if no traffic

in dual-transit queue mode: class A frames from the ring are queued in PTQ class B, C in STQ

priority for egress frames in PTQ local class A frames local class B (when no frames in PTQ) frames in STQ local class C (when no PTQ, STQ, local A or B)

Notes:class A have minimal delayclass B have higher priority than STQ transit frames, so bounded delay/FDVclasses B and C share STQ, so once in ring have similar delay

Page 63: Protection

Y(J)S APS Slide 63

RPR - protectionRPR - protection

rings give inherent protection against single point of failure

RPR specifies 2 mechanisms steering wrapping (optional)

(implementations may also do wrapping then steering)

wrap

steering info

Page 64: Protection

Y(J)S APS Slide 64

NERT and CLEERNERT and CLEERNew Ethernet Ring Technology / Closed Loop Encapsulated Ethernet Ring

Similar to RPR but uses real Ethernet format

NERT and CLEER distinguish between ring nodes switches connected to ring nodes

Traffic in ring is MAC-in-MAC encapsulated External MACs are of ring node Internal MACs are original

Unexpected external MACs discarded

External MACs learned as in 1ah

Ring nodes forward according to table

NERT floods, CLEER never floods

Protection switch only involves changing tableso service restoration is fast

ring nodes

switches

Page 65: Protection

Y(J)S APS Slide 65

MPLS fast rerouteMPLS fast reroute

IP FRR

RFC 4090

Page 66: Protection

Y(J)S APS Slide 66

IP FRRIP FRR

True protection mechanisms do not exist for connectionless IPIn practice, routing protocols discover breaks and recalculate routes

but this usually takes a long timeLink-state IGPs detect link-down state using hellos

for OSPF - typically every 10 sec, and detection after 40 secand then Dijkstra algorithm avoids the failed link

BFD can be used to speed up the detectionHowever, the information still has to be propagated further (seconds?) and FIBs updated (100s of ms)

Various IP Fast ReRoute (IP FRR) mechanisms have been proposedbut true protection is best done at the MPLS level

Page 67: Protection

Y(J)S APS Slide 67

MPLS fast rerouteMPLS fast reroute

RSVP-TE enables MPLS traffic engineering by fine control over placementspecifies explicit path using information gathered from IGPresources may be reserved at LSRs along the way

RFC 4090 defines extensions to RSVP-TE – Fast ReRoute (FRR)LSRs along the path preconfigure local bypasses (detours)Upon detection of failure by BFD (specified in microseconds, typically 10s of ms) or RSVP hellos (RFC default is 5 ms) or RESV / PATH messages (driven by IGP)upstream LSR simply enables the detourSince this is a local action, it should be fastRFC 4090 only discusses adding FRR to RSVP-TE network

but its use with LDP is possible if there is a single label generator

not discussed

in RFC 4090

Page 68: Protection

Y(J)S APS Slide 68

PLRs and MPsPLRs and MPs

A fundamental entities in MPLS FRR are Point of Local Repair (PLR) Merge Point (MP)

A PLR is the LSR before the failed element (link or node)

All LSRs except the egress LER can be PLRs

The PLR is solely responsible for the FRR (no explicit APS signaling)

During path setup, potential PLRs create detours towards the egress LER

A MP is the LSR where the detour rejoins the LSP

All LSRs except the ingress LER can be MPs

ingressLER

egressLERPLR MP

Page 69: Protection

Y(J)S APS Slide 69

MethodsMethodsRFC 4090 defines two different protection methodsUsually one or the other is employed in a given network

One-to-one backup each LSP protected separately detour LSP created for each LSP at each potential PLR no labels pushed

Facility backup backup tunnel for multiple LSPs bypass tunnel created at each potential PLR uses label stacking

PLR MP

PLR MP

Page 70: Protection

Y(J)S APS Slide 70

NHOP and NNHOPNHOP and NNHOP

MPLS FRR can bypass a failed link or a failed nodeIn order to bypass a single failed link

we need an alternative path to the next hop (NHOP)

In order to bypass a single failed node, we need an alternative path to the next next hop (NNHOP)

PLR MP

PLR MP