We Energies Case Study A Converged IP/MPLS Network ...
Transcript of We Energies Case Study A Converged IP/MPLS Network ...
© 2013 Utilities Telecom Council
Transforming Critical Infrastructure
We Energies Case Study
A Converged IP/MPLS Network Suitable for Critical
Teleprotection Requirements:
Solutions beyond PDH and SDH/SONET TDM
Jeff Polan, DNV GL ● Chris Staszak, Alcatel-Lucent
Agenda – We Energies Network Development
• Network Design & Development
– Challenges Facing We Energies Network
– Bandwidth Requirements
– Teleprotection Requirements
– Voice Requirements
– Principal Options Reviewed
– New Network Architecture
– Wide Area Connectivity
– Migration & Cyber Security
• Network Implementation & Lessons
Learned
– Services
– MPLS Implementation
– Quality of Service
– Network Synchronization
– Migration of TDM and DATA
– Network Management
– Key Lessons Learned
2
We Energies Service Area
3
Issues & Challenges Facing We Energies Network • Uses Multiple Technologies:
– Gigabit Ethernet and Fiber Channel over DWDM between Data Center (DC) and Business Center
(BA)
– SONET OC-48 rings connecting 19 sites to Data Center and Business Center
– Gigabit Ethernet and TDM over SONET
– Gigabit Ethernet over dark fiber connecting 15 sites to Data Center and Business Center
– TDM microwave (nxDS1, mxDS3) links connecting about 100 sites to DC/BA and other network
locations
• Multiple Service Types:
– Voice, ip- and Ethernet- Data, SCADA, Teleprotection
• Looming Capacity constraints
– Extension to as many as 300 additional Substations
– New high-bandwidth services including video surveillance
• Equipment Obsolescence:
– DWDM and SONET gear 10-12 years old, Manufacturer Discontinued (MDC) and no longer
supported
– Immediate need – ethernet data traffic on SONET network needs to be offloaded due MDC interface
• Cost and looming obsolescence of 3rd-party T1/T3 services
4
Bandwidth Requirements
• Network traffic is mostly data (ip and Ethernet) traffic
• Requirement to connect substations to the network - Primary bandwidth
driver at substations currently: Real-time Video surveillance:
– Bandwidth requirements are a function of video quality:
– TV quality video (30 fps) ~ 2-3 Mbps per video stream
(MPEG4/H.263)
– Good quality video (5 fps) ~ 400-500 kbps per video stream
– Communications requirement is to support 6-7 real-time video
surveillance streams at each substation
– Derived Bandwidth Requirements:
• 10-20 Mbps per substation for TV quality video
• 2-3 Mbps per substation for video at 5 fps
• Network should be designed to accommodate any future requirements for
next 10-15 years
5
Teleprotection Requirements
• Teleprotection traffic poses particular challenges
– Latency Requirements: 5 ms one way latency, 1 ms
asymmetric latency difference for CDP (Vendor S); more
typically 3ms one-way latency (61850-5) and 0.5ms
asymmetric latency max for CDP
– Interfaces: T1/fractional T1, 4-wire E&M; more typically RS-
232/422, X.21, E1, Ethernet (GOOSE), ip,
– Availability: >99.999% at less than 1 E-06 BER desired
– Low-speed interfaces (i.e. 9600bps) generally pose the
most serious challenges because of packetization delays;
SEL “mirrored bits” poses a particular challenge
6
Voice System Requirements
• Current system reaching end of life
• We Energies will have a project in 2014 to upgrade of its Voice
System
• DNV GL recommended moving to a more VoIP based infrastructure
• As part of the project to upgrade, We Energies considered
– SIP trunking
– SIP to integrate Microsoft Communicator/Lync
– SIP to support video conferencing to desktops
– E911 services for IP phones
– Develop a migration plan to move voice traffic to the IP/MPLS
network
7
Principal Options Reviewed
• Replacement OC-48/OC192 SONET
• Carrier Ethernet Transport (CoE) – PBB-TE
• Native IP (TDMoIP)
• IP/MPLS Network
• MPLS-TP
• OTN (G.709)
8
DNV GL’s Experience:
Entirely feasible to support teleprotection traffic
− Well designed and dimensioned MPLS network with appropriate high speed
connections within the MPLS network (e.g., 1 Gbps over fiber)
− Controlled number of hops
− Proper edge-router interfaces,
− Proper QoS provisioning including strict-priority scheduling, and
− Proper traffic engineering
• 1588v2 time synchronization performance can provide further protection,
against latency asymmetry for current or future CDP teleprotection
devices and/or PMUs without the need to deploy separate GPS-
disciplined PPS/ToD clocks at each of these devices
9
Example: Actual Test Results – Teleprotection performance under 100%
congestion conditions
• Channelized DS1 (TDM) carrying critical CDP and TPR traffic,
even under extreme stress conditions:
– over 6 nodes/5 hops, with one hop fully 100% congested with BE
in-policy traffic, channelized DS1 traffic (at EF in-policy) was
preserved without error using the recommended QoS, with maximum
1-way latencies never exceeding 3.8ms (with typical per-node
forwarding delays under 20us), and maximum indicated latency
asymmetry never exceeding 200us in all testing observed over 5
days (test conditions: 8 frames or 1ms packetization [192 bytes at
DS1], and 5ms jitter buffer);
– these numbers substantially exceed the WE Energies requirements of
<5ms and <0.5ms respectively, and as this 6 nodes/5 GiGE hops test
is likely to exceed any practical teleprotection connectivity demands
even after FRR/End-to-End protection switching, this test is an
extreme test of latency/latency asymmetry performance of
teleprotection services.
10
Recommendations for Wide Area Network
• Transition to a single converged network using
IP/MPLS
– Single network to design, operate, manage and maintain
– Capable of carrying both data traffic and legacy TDM traffic
(Voice, SCADA, teleprotection)
– Low latency, path deterministic, supports traffic engineering,
QoS, Fast Re-Route in case of link failures (<50 ms typical),
primary/secondary path protection and/or dual pseudowires
– Segment network between different applications (l2/L3
VPNs) for different QoS or security requirements
– Mature Technology with a large vendor ecosystem
• Hierarchical design of the network with well organized
core, aggregation and access layers
11
Wide Area Network Design
12
Migration
• How to get from the current system to the planned network?
• Installation and deployment, provisioning, test and verification MUST
have no impact/minimal impact on current operations
• NO free fiber available on existing SONET rings, uncertain availability
on most “dark fiber” segments
• Chosen solution: Overlay new MPLS network over existing SONET and
GigE networks using banded 1550nm CWDM
– Essentially zero impact on current network,
– highly cost-effective 5xGigE capacity in MPLS network,
– Both MPLS and SONET networks can co-exist indefinitely
– Allows migration to be performed only after new network is fully
deployed, provisioned, tested, commissioned
– Migration can be performed incrementally, service-by-service, on as-
needed basis
• Extend IP/MPLS network to new sites
13
Cybersecurity
14
NERC CIP requires additional cyber-security protections and controls on cyber-
assets associated with identified critical assets (therefore critical cyber assets
or CCAs) with “qualifying connectivity” - i.e., using routable protocols
MPLS offers some benefits as fits within “non-routable protocol” exception
(forwarding decisions based only on MPLS labels)
Recommendations include:
Enable routing/switching control plane security
Enable data encryption on transit (data plane security)
Use network segmentation based on VPNs
Enable Network Admission Control.
Enable embedded security for virtualized data centers.
A comprehensive and detailed cyber-security framework, architecture, controls,
and implementation analysis will need to be performed for the chosen
communications solution and its deployment
Importance of Independent Lab Tests to confirm ability to
support critical services
• Comprehensive testing performed on a scaled-down facsimile of actual
network - approximately 20 nodes (about 50%) using actual core and edge
routers, network architecture, serial/TDM interfaces, L2/L3 VPNs (data
segregation), provisioning and protection schemes, and NMS
• Verify effective QoS management is maintained for critical legacy TDM
services (teleprotection) under a wide range of simulated impairments (packet
jitter, packet drop, clearing jitter-buffer under-run/over-run etc.), failover
actions (FRR and primary/secondary path protection), and 100% traffic
congestion conditions
• On critical TDM services, must verify packetization and jitter buffer
provisioning appropriate to required latency
• Must verify IEEE 1588V2 network time synchronization accuracy is achieved,
and exhibits proper transient settling behavior on primary-secondary clock
failover to same synchronization accuracy
15
Summary & Conclusion:
• One converged network to design, operate, and maintain
• Properly designed and dimensioned, supports all legacy voice,
SCADA, and Teleprotection TDM services, with failover equal or
better than existing SONET (<50ms), sub 5 ms latency, sub 0.5ms
latency asymmetry, and asymmetry protection using IEEE 1588v2
• Extended IP/MPLS network to new sites – 40 sites today, as many
as 350 sites in the future
• Extend capacity throughout to 5GE from OC-48 and GE
16
Network Implementation Detail
17
Aggregation sites
• 31 sites each with an aggregation router ( 7705 SAR-18)
• Sites are interconnected w/ five pair of fibers with each pair fiber carrying a single GigE • CWDM wavelengths will be multiplexed together by a Finisar optical add/drop passive CWDM multiplexer
Optical
• Two units used between the two Data Centers • Additionally, 3 paths have been identified where the loss exceeds the link budget of the available SFPs so We Energies is using DWDM 1830 PSS-16s instead of Finisar
Management • 5620 Service Aware Manager (SAM)
• 5650 Control Plane Assurance Manager (CPAM)
• Service Portal Express
L
1830
Core
Core
1830
1830
120km optics
40km optics
1
2
3 4 5.1
5.2
6
7
8
9 10
11 12
13
14
15
16
17 18
19 20
21
22
23
24
25
43
26.1 26.2
27
28
29 30
31
32
33 34
35
36 37 38 39
40.1
40.2
41
42
We Energies IP/MPLS
Network
• Work started in March 2013
• Deployment starting from edge and working to Core
• Total actual duration on completion likely to be slightly over 1
year
18
Core Detail – 5 x 1 GigE
7750
SR-7
Finisar FWSF-M/D-1550/
CWDM
Existing
OC-48
7705
SAR-8
1471nm 1491nm 1511nm 1591nm 1611nm
CWDM and GigE Signals
Notes: 1531nm, 1551nm, and 1571nm not available for use. Used by existing
OC-48. Local drops not shown.
Copper GigE
Unamplified
Composite CWDM
Signal over Dark Fiber Single Fiber direction
shown. Other
Connections not shown.
5620 SAM and 5650 CPAM
Redundant configuration,
one system located at 2 core sites
Connected to 7750 SR-7
5620 SAM and SPE Clients
Located anywhere with
IP connectivity to 5620
SAM and SPE Servers
Service Portal Express for Utilities
(SPE) - One Web Server Portal
located at a core Site, IP
Connectivity required to 5620
SAM and client browsers.
We Energies – Services Provided
Service MPLS Svc
Type
Comments
Voice Over IP VPRN-Voice SIP signaling and Payload
Corporate Data Service VPRN-Data Regular corporate IP services to include
Internet Access.
Traditional TDM service VLL (C pipe) Network Synchronization is being
implemented by Sync-E
Management Service VPRN-Data In-band IP traffic
STILL TO BE DEPLOYED
Teleprotection VLL (C pipe) Will require RSVP TE
Video Traffic VPRN-Data Corporate video traffic which will be
deployed in future.
20
MPLS Implementation Overview
• OSPF is used as Interior Gateway Protocol (IGP) for the MPLS network
• Label Distribution Protocol (LDP) is used as the protocol to build the LSPs.
• Multiprotocol-BGP will be used as VPRN service label distribution protocol.
• LDP provides a standard methodology for label, distribution by assigning
labels to routes that have been chosen by the IGP routing protocols. The
resulting labeled paths, called label switch paths (LSPs), forward traffic
across an MPLS backbone.
• In regard to VLL services (VPLS/Cpipe/Ipipe etc.), Target LDP (T-LDP) is
used for inner label distribution.
• A key objective was to implement the network with minimal disruption to
edge devices using EGRIP.
• Bidirectional Forwarding Detection (BFD) is used for failure detection.
• RSVP TE to implemented with Teleprotection
• Multi Area OSPF is being used with Area 0 and Areas 1&2 to allow network
growth to eventual size
21
Quality Of Service– Forwarding Class (FC)
FC FC name Class type Notes
NC Network control Real time For network control traffic.
H1 High-1 Real time For delay/jitter sensitive data.
EF Expedited Real time For delay/jitter sensitive data.
H2 High-2 Real time For delay/jitter sensitive data.
L1 Low-1 NRT – Assured For assured traffic
AF Assured NRT – Assured For assured traffic
L2 Low-2 NRT – Best effort For BE traffic.
BE Best Effort NRT – Best effort For BE traffic.
22
Quality Of Service – Deployment (FC To Queue)
Service name Priority FC Queue
SAP
Ingress
Policy
SAP
Egress
Policy
PIR
(mbps)
CIR
(%LR)
Voice Payload Platinum EF 5 103 103 35 30
SIP Gold H2 4 102 102 11 10
Video Gold H2 4 104 104 40 30
Management Silver L1 3 105 105 11 10
Best Effort Bronze BE 1 101 101 65 60
23
Network Synchronization: Sync-E & 1588V2
• Clock synchronization is needed in order to ensure proper operation of the
TDM services (C pipe). Both 7750 SR7 and 7705 SAR8/18 can be
configured with an External clock source as well as up to 2 other timing
references.
• Time synchronization is needed for TPR, PMU, D/A and SCADA
• Four sites in We Energies' network provide GPS – disciplined clock
sources
• Nodes running SyncE (Synchronous Ethernet) synchronize their clocks to
the Ethernet bit stream. Synchronous Ethernet replaces a portion of the
standard Ethernet preamble with a 4-bit synchronization pattern and a 4-
bit coding violation pattern.
• All nodes in this network will be configured with 2 peers under SyncE
configuration.
• SyncE will automatically decide which peer has a better quality of clock to
use. Should this peer fail to provide higher quality clock the node will
switch to secondary peer. These quality measurements happen
automatically.
24
Network Synchronization: Sync-E
25
Synchronous-Ethernet (Sync-E) is considered a line-timed source for Service Synchronization Unit (SSU) it operates at the physical layer and is immune to PDV or packet loss at higher layers.
The sender derives a reference frequency from the node’s SSU to be used by the Ethernet transmission clock. The receiver will then synchronize its SSU to the received frequency on the Ethernet port.
When using Sync-E all nodes/links between the source and destination nodes MUST support and have Sync-E enabled.
Migration Of Services – TDM Based Service
(ONLY LEFT-TO-RIGHT DIRECTION SHOWN)
IWF
N x 64 kb/s
or T1/E1 Data
Sig
GigE GigE Data
Sig
PACKETIZATION
• As TDM traffic from
the Access Circuit (AC)
is received, it is
packetized and
transmitted into the
IP/MPLS network
NETWORK
• Fixed delay ‒ Packet transfer delay based
on link speeds and distances
from end to end
• Variable delay ‒ The number and type of
switches
‒ Queuing point in the
switches
• Network QoS is key to
ensure effective service
delivery
PLAYOUT
• TDM PW packets are
received from the
IP/MPLS network and
stored into its associated
configurable jitter buffer
• Playout of the TDM data
back into the AC when it’s
at least 50% full
Access circuit
Access circuit
TDM Packets moving in this
direction CONTROL CENTER REMOTE SITE
N x 64 kb/s
or T1/E1 IP/MPLS
Network Jitter
buffer IWF Performance
26
Access
Distribution
Pre Migration Network – Corporate Data
Mgmt IT Data Voice
Video
Access Access Switches
Layer 2 or Partial Layer 3
Site A
Site X
Distribution Access
Access
27
Migration into MPLS
IP/MPLS
Access
Mgmt IT Data Voice
Video
Access Access Switches
VPRN
7705 - PE
28
Migration Of Services – Data
29
Network Management – Three Elements
• 5620 Service Aware Management (SAM): provides
enhanced service provisioning and assurance capabilities
for operation, administration, maintenance, and
provisioning functions.
• 5650 Control Plane Assurance Manager (CPAM) is a
multi-vendor route and path analytics tool to provide real-
time visualization, surveillance and troubleshooting
• Service Portal Express: Provide utility specific network
management capabilities
7701
CPAA
5620 SAM
&
5650 CPAM
Alcatel-Lucent
Key Lessons Learned
Early involvement of
all parties is
important
• Initial planning efforts were focused on the transport group
• IT group joined when full potential of the new architecture was
realized
• Additional planning time (3 months) was required
Value of an effective
Network Management
System
• Early number of users was underestimated
• The GUI allowed more than ”CLI only” users to have a view of the
network
• Managers could view and understand network events
Multi Area OSPF
Implementation
• Multi area OSPF was implemented to accommodate planned
network growth
• This also facilitated the integration of the Access switch sites into
the larger network
Migration of
customer sites to
OSPF
• Original plan was to keep these sites on EGRIP, but conversion to
OSPF proved to be less work than expected
• Migration of sites to OSPF allowed network simplification to occur
and to also allow removal of some nodes
Key Lessons Learned
Test Lab
• Lab testing prior to deployment was found to be very effective
• Testing validated move from EGRIP to OSPF was not as difficult
as originally expected
• Staging of equipment for POC in the “vaults” was a major benefit
Quality of Service
Capabilities
• Enhanced QoS capabilities provided more opportunities
• Previously only voice received high priority, now more options are
available.
• This should be planned for during the initial design meetings
Resident Engineer
• Provided by the vendor; this individual became a major help
• Involvement even earlier in the process would have been good
• Resident Engineer’s expertise on the products and vendors
organization quickly made him a valueable resource
Reduce numbers of
devices in the
network
• Capabilities of the new aggregation equipment allowed the
elimination of some devices at Access switch sites
• In addition to simplifying the network, this action also improved
latencies in the network
SAFER, SMARTER, GREENER
www.dnvgl.com
Any further questions or comments?
Thank you for your time and attention
33
Mark Burke
(703) 678-2837
Jeffery Polan
(267) 616-6907
Chris Staszak
(978) 952-7829
General Lessons Learned (Beyond KCP&L and We Energies)
• Packet based networks can support the requirements of Utilities when
properly engineered
• TDM to Packet migration can be done successfully, with little or no
service impacts, when the new network is entirely separate and
overlaid the old, and service migration is done in stages
• Potentially, the number of devices (boxes) in a network can be reduced
• Bandwidth can be more effectively utilized in a MPLS based network
• Reliability can be maintained through the use of MPLS
• Traffic engineering and enhanced QoS opens up new potential
• Legacy interfaces can be supported, without requirements for whole
scale changes at customer sites
• Movement to MPLS can reduce CAPEX & OPEX without compromising
reliability