E E 681 - Module 18
M.H. Clouqueur and W. D. Grover
TRLabs & University of Alberta
© Wayne D. Grover 2002, 2003
Analysis of Path AvailabilityAnalysis of Path Availability in Span-Restorable Mesh in Span-Restorable Mesh
NetworksNetworks
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 2
Review of Mesh Design
Motivation: Something must be done to reduce the impact of network element failures on service availability
Solution: Mesh Restoration Mechanism (Requires extra capacity)
Capacity planning methods: • Max Latching• Herzberg• Modular capacity placement• Joint working-spare capacity placement
More and more capacity efficientbut
Availability ???
AvailabilityCapacityIntuitively:
Questions to answer: • How much does mesh restoration improve the availability of service?
• How does the availability depend on the total capacity?
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 3
What Causes Unavailability?
Single span failuresMultiple span failuresNode failuresSpan maintenance servicesCombinations of the above
What we need to compare:
Which are the most important?
Count P(event) Impact
Number of such events Example: Probability of bringing the system under study in down state
By doing this comparison the major contributor to unavailability appears to be:
Combination of Span failure and Span maintenance service (equivalent to dual span failure in the worst case)
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 4
Impact of Failures
For the previous comparison we could only guess or make assumptions for the value of the impact of each failure categories.
Examples: Single span failure, Impact = 0 (network fully restorable to single failures) Dual span failure, Impact = 0.5 (at least half of the traffic on average should be restorable)
Determination of availability of service paths:
We need to know the exact value of the impact of each failure scenario on the availability of that service path
Availability analysis of path p:
Failure of (S1, S2) Impact = 0.3321Failure of (S1, S3) Impact = 0.0000
Failure of (Si, Sj) Impact = 0.5243
We need a tool that determines the probability of path p being down for any given set of failed spans
......
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 5
Problem of Independence of Span Failures
The contribution of a failure event to the unavailability is: P(event)×Impact
For a dual span failure: i ji j S SP[failure of (S , S )]=U ×U
Based on the assumption that failures of Si and Sj are independent
Special Case:
S1
S2
S3
S2
S1
1 31 3 S SP[failure of (S , S )] U ×UIn that case:
This span does not really exist
but rather 11 3 SP[failure of (S , S )]=U
Common cable sheaths
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 6
Path Availability Calculation
Exact Expression: path
all failure events
U = P(event) P(event affecting path)
Simplified approach: *path links, i
i in path
U = U
Where: U*links, i is the equivalent link unavailability on span i
Advantage: we only need to compute one value for each span and then use those values for the calculation of end-to-end availability.
Drawback of simplified approach: Some failure events contribute to the unavailability of links on several spans in a neighbourhood and can therefore be counted several times when summing the U*
links,i’s.
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 7
Link Equivalent Unavailability
Concept of Equivalent Unavailability:
• Non-restorable network: Ulink = Uspan (physical unavailability)
When the span is down, the link is down
• Restorable network: Ulink = Ulink* (Equivalent link unavailability)
Ulink* is different from Uspan because of the restoration mechanism
We will see that Ulink* is in the order of Uspan
2 therefore Ulink* << Uspan
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 8
Derivation of Ulink*
2s 2
SU · 2w · 1-R
2
It can easily be shown that the expected number of failed and non restored links in the network at any time is:
R2: Average Dual Failure Restorability of Links
In general Ulinks,i* can be defined as: Number of failed and non- restored links
Total number of links affected by failure
2s 2
*link
SU 2w 1-R
2U =
working links
2s 2
SU 2w 1-R
2=
S w
2*link s 2U =U (S-1) 1-R
The only unknown
S: Total number of spansUs: Average physical unavailability of spansw: Average working capacity of spans
Span-specific average U*link(i) can be obtained using
span-specific average R2: R2(i) (calculated over S-1 dual-failure scenarios involving span i)
a,b
a,b
Nab
a+ b2R =1-
(w w )
Nab: non restored working units in the case of failure of span a and span b
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 9
Determination of R2
There is no closed form model for R2 as the impact of each failure scenario depends on several factors specific to the failure case. However failures events can be divided into a few main categories:
Case 0: Span failure and wi > feasible spare paths
Case 1: Two failures but no spatial interactions
Case 2: Two failures and spatial interactions (competition for spare capacity)
Case 3: Two failures with second failure hitting the first restoration pathset
Case 4: Two failures isolating a degree-2 node
not possible by definition in a restorable network
no outage
may be outage
certain outage
may be outage
Unavailability Sequences:
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 10
Impact of Dual Failures
Example #1, no spatial interaction:
NO OUTAGE
W = 3
3
W = 2
2
The two restoration paths do not interfere
W: working capacityS: spare capacity
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 11
Impact of Dual Failures
W: working capacityS: spare capacity
W = 3
W = 2
Example #2, spatial interaction - capacity dependency:
S < 5or S > 5 ?
Is there enough spare capacity to restore both failures?
POSSIBLE OUTAGE depending on the value of S
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 12
Impact of Dual Failures
W: working capacityS: spare capacity
W = 2
2
The second failure hits the restoration path set deployed for first failed span
The outcome in this situation depends on the adaptability of restoration mechanism and on the amount of remaining spare capacity
POSSIBLE OUTAGE
Example #3, spatial interaction - special case:
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 13
Impact of Dual Failures
W = 3
Nothing can be done to restore any of the two failures
OUTAGE
W = 2
W: working capacityS: spare capacity
Example #4, isolated node:
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 14
Adaptability of the restoration mechanism
S2
S1
S2
S1
S2
S1
Static behaviour Partly adaptive behaviour Fully adaptive behaviour
Restoration preplan says:“S2 is to be restored through S1”
S2 is restored via another route where spare capacity is available
S1 is left unrestored
S2 is restored via another route where spare capacity is available
S1 is restored again (if possible) with release of spare capacity previously used for restoration of span S1 (similar to path restoration’s stub release)
Optional: The spare capacity used on span S2 gets “working status” and benefits from restoration effort for S2
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 15
Results of Case Studies
* Designed with Optimal Modular Spare Capacity Placement
Non-modular environment
Modular Environment*
Static behaviour 0.53 to 0.75 0.69 to 0.83Partly-adaptive 0.55 to 0.79 0.87 to 0.91Fully-adaptive 0.55 to 0.80 0.91 to 0.99
Typical test network:
R2 Results for 5 test networks:
With a fully adaptive behaviour in a modular environment the working units enjoy almost full restorability to any dual span failures
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 16
Improvement over a non-restorable network
Path availability improvement example:
Test network: EuroNetA (19 nodes, 37 spans)Reference path: 5 hopsAssumption: Us=310-4
If the network is non-restorable: Ulink = Uspan, Upath = 1510-4 = 13 hrs/year
If the network is restorable, the simulation with fully adaptive behaviour gives:R2 = 0.716735 Ulink
* = 9.1810-7 Upath = 4.5910-6 = 2.4 min/year
Making a network restorable to single span failures brings a considerable improvement in the average availability of service paths.
For specific services it might still not be enough … How can we make service paths even more available?
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 17
Design for High Availability
The idea is to provision a network from an availability standpoint.
Two integer programming formulations were developed:
• Dual Failure Minimum Capacity (DFMC)
Finds the minimum capacity assignment for full restorability to dual-failures (R2=100%)
note: Cannot be used for networks with any degree-2 graph cut.
• Dual Failure Max Restorability (DFMR)
Finds the spare capacity placement that maximizes the average restorability to dual-failures for a given spare capacity budget.
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 18
R2 Design - Experimental Results
Cost of improving the R2 restorability:
Spare capacity for R1=100% 223 units(55% redundancy)
Total working: 405
Spare capacity forR2=100% 628 units(155% redundancy)
0.75
0.8
0.85
0.9
0.95
1
1.05
200 300 400 500 600 700
Total Spare Capacity
Ave
rag
e R2
9
1
4
710
11
5
12
6
2
8
3
628223
To go from R2 = 80% to R2 = 100% we need to almost TRIPLE the spare capacity
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 19
Conclusion of R2 studies
It is very costly to guarantee R2 restorability to all service paths in the network. However in most cases of dual span failures the restoration mechanism is able to restore part or all of the failed working units
Idea: For little or no extra capacity it should be possible to guarantee full restorability to dual failures to selected network connections
W = 3(including 1 with higher priority)
W = 22
1
S = 3
Constraint modification for the Dual Failure Minimum Capacity formulation (DFMC):
instead of
“For any dual failure restoration paths must be found for all failed working units”
we now have
“For any dual failure restoration paths must always be found for all working units requiring R2 restorability”
Restoration of higher priority connection
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 20
Multi-Priority Mesh Design
1
S
k kk
Minimize C s
,1
( , ) 1,2,..., .
iP
pi j i
p
f x i j S i j
, ,(1 ) ( , ) 1,2,..., . , 1, 2,..., . p pi j i j if C i j S i j p P
, , , ,1 1
( , , ) 1,2,..., . , ,
jiPP
p p p pk i j i k j i j k
p p
s f f i j k S i j i k j k
1
1,2,..., .
iP
pi i
p
f w i SIn case of single failure of span i, restore all working units that require R1 restorability
In case of failure of spans i and j, restore all working units (xi) that require R2 restorability
Route p cannot be used if it crosses one of the two failed spans
,1
( , ) 1, 2,..., .
iP
p pk i i k
p
s f i k S i k Spare capacity is needed to support restoration of single span failures
Spare capacity is needed to support restoration of dual span failures
Subject to:
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 21
Network Availability Simulator
The simulator determines times of span failures and repairs according to given statistical distributions:
span 12 span 12 span 5 span 2 span 2 span 5 span 8
At each stage:
Set offailed spans
Restorationanalyzer
Set of lost Connections Connections
Outage Recorder
stage 1 stage 2 stage 3 stage 4 stage 5 stage 6t
The objective of the simulator is to obtain information about the availability of end-to-end network connections by generating span failures at random times and analyzing the restoration depending on connection priorities
Characteristics of a network connection: • Origin node• Destination node• Size (STS-1, STS-3, STS-12,…)• Restorability Requirement (R0, R1, R2)• Routing between O and D
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 22
Network Availability Simulator
Advantages of the simulator:
• Confirm results obtained with theoretical availability expressions based on R2.
• Obtain information about the distribution of outage times (1000 outages of 0.1 sec has a different impact than 10 outages of 10 sec)
• Possibility to use different distributions of time-to-repair and time-between-failure for each span.
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 23
Mesh/Ring Availability Comparison
Single span failures
Mesh: Full protection
Rings: Full protection
Dual span failures with no spatial interaction
Mesh: Full protection
Rings: Full protection
Dual span failures with spatial interaction
Mesh: Protection from 0% to 100% of the working units depending on available spare capacity and adaptability of the restoration mechanism
Rings (2 span failures on same ring): Protection of about 2/3 of the traffic (demands that are not isolated by the 2 span failures
For connections requiring R1 restorability, the Ring-based solution and the mesh solution provide similar levels of availability
E E 681 - Module 18 © Wayne D. Grover 2002, 2003 24
Mesh/Ring Availability Comparison
Possibility of guaranteeing R2 restorability:
origin
exit
The connection is lost whatever his priority level is.
Mesh Networks: Yes! With adequate design and an adaptive restoration mechanism.
Ring Networks: No, in certain cases restorability to dual failures cannot be guaranteed
Example:
Conclusion: The mesh architecture seems to be more appropriate than rings to serve demands with high availability requirements
Top Related