Fiber, Fabrics & Flash: The Future Of Storage Networking
-
Upload
rajesh-nambiar -
Category
Technology
-
view
303 -
download
1
Transcript of Fiber, Fabrics & Flash: The Future Of Storage Networking
1 © Copyright 2015 EMC Corporation. All rights reserved. 1 © Copyright 2015 EMC Corporation. All rights reserved.
2 © Copyright 2015 EMC Corporation. All rights reserved.
FIBER, FABRICS & FLASH THE FUTURE OF STORAGE NETWORKING
2 © Copyright 2015 EMC Corporation. All rights reserved.
3 © Copyright 2015 EMC Corporation. All rights reserved.
ROADMAP INFORMATION DISCLAIMER EMC makes no representation and undertakes no obligations with
regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”).
Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
Roadmap information is EMC Restricted Confidential and is provided under the terms, conditions and restrictions defined in the EMC Non-Disclosure Agreement in place with your organization.
4 © Copyright 2015 EMC Corporation. All rights reserved.
AGENDA Network Convergence Protocols & Standards
Solution Evolution
RDMA-based Fabrics
Flash: NVMe and Fabrics
Conclusion
5 © Copyright 2015 EMC Corporation. All rights reserved.
ETHERNET CONVERGED DATA CENTER
10 Gigabit Ethernet (and 40 Gigabit Ethernet) – Single network simplifies mobility for virtualization/cloud deployments
– Also 25/50/100 Gigabit Ethernet
Simplifies infrastructure – Reduces the number of cables and server adapters – Lowers capital expenditures and administrative costs – Reduces server power and cooling costs
FCoE and iSCSI both leverage this technology.
LAN
SAN Single Cable for Network and Storage
10/40 GbE
6 © Copyright 2015 EMC Corporation. All rights reserved.
UN-CONVERGED RACK SERVERS • Servers connect to LAN, NAS
and iSCSI SAN with NICs • Servers connect to FC SAN
with HBAs • Some environments are still
Gigabit Ethernet • Multiple server adapters,
higher power/ cooling costs
Rack-mount servers
Ethernet Fibre Channel
Ethernet LAN
Ethernet
Ethernet NICs
Storage
Fibre Channel SAN
Fibre Channel HBAs
Ethernet
iSCSI SAN
Note: NAS is part of the converged approach. Everywhere that Ethernet is used in this presentation, NAS can be part of the unified storage solution
7 © Copyright 2015 EMC Corporation. All rights reserved.
AGENDA
• Network Convergence • Protocols & Standards • Solution Evolution • RDMA-based Fabrics • Flash: NVMe and Fabrics • Conclusion
8 © Copyright 2015 EMC Corporation. All rights reserved.
IP Network
ISCSI INTRODUCTION Transport storage (SCSI) over standard Ethernet
– Reliability through TCP
More flexible than FC due to IP routing – Effectively reaches lower-tier servers than FC
Good performance iSCSI has thrived
– Especially where server, storage, and network admins are the same person
– Example: IaaS Clouds ▪ E.g., OpenStack
Link
IP
TCP
iSCSI
SCSI
9 © Copyright 2015 EMC Corporation. All rights reserved.
ISCSI INTRODUCTION (CONTINUED) Standardized in 2004: IETF RFC 3720
– Stable: No major changes since 2004 – 2014: Consolidated spec (RFC 7143), minor updates (RFC 7144)
▪ Backwards-compatible with existing implementations
iSCSI Session: One Initiator and one Target – Multiple TCP connections allowed in a session
Important iSCSI additions to SCSI – Immediate and unsolicited data to avoid round trip – Login phase for connection setup – Explicit logout for clean teardown
10 © Copyright 2015 EMC Corporation. All rights reserved.
ISCSI READ EXAMPLE
Optimization: Good status can be included with last “Data in” PDU
Command Complete
Receive Data
SCSI Read Command
Initiator Target
Status
Data in PDU
Target Data in PDU
Data in PDU
11 © Copyright 2015 EMC Corporation. All rights reserved.
ISCSI WRITE EXAMPLE
Optimization: Immediate and/or unsolicited data avoids a round trip
Status
Initiator
Ready to Transmit
Target
SCSI Write Command
Ready to Transmit
Command Complete
Receive Data
Receive Data
Data out PDU
Data out PDU
Data out PDU
Data out PDU
12 © Copyright 2015 EMC Corporation. All rights reserved.
CRC Ethernet Header
IP TCP iSCSI
ISCSI ENCAPSULATION
Delivery of iSCSI Protocol Data Unit (PDU) for SCSI functionality (initiator, target, data read/write, etc.)
Provides IP routing capability so packets can find their way through the network
Reliable data transport and delivery (TCP windows, ACKs, ordering, etc.) Also demux (port numbers)
Provides physical network capability (Cat 6a, MAC, etc.)
Data
13 © Copyright 2015 EMC Corporation. All rights reserved.
FCOE: ANOTHER OPTION FOR FC
FC: large and well managed installed base – Leverage FC expertise / investment – Other convergence options not incremental for existing FC
Data Center solution for I/O consolidation
Leverage Ethernet infrastructure and skill set
FCoE allows an Ethernet-based SAN to be introduced into an FC-based Data Center
without breaking existing administrative tools and workflows
14 © Copyright 2015 EMC Corporation. All rights reserved.
FCOE EXTENDS FC ON A SINGLE NETWORK
Network Driver
FC Driver
Converged Network Adapter
Server sees storage traffic as FC
FC Network
FC Storage
Ethernet Network
FCoE Switch
Lossless Ethernet SAN sees host as FC
Ethernet FC
15 © Copyright 2015 EMC Corporation. All rights reserved.
FCOE FRAMES FC frames encapsulated in Layer 2 Ethernet frames
– No TCP, Lossless Ethernet (DCB) required – No IP routing
1:1 frame encapsulation – FC frame never segmented across multiple Ethernet frames
Requires Mini Jumbo (2.5k) or Jumbo (9k) Ethernet frames – Max FC payload size: 2180 bytes – Max FCoE frame size: 2240 bytes
Eth
ern
et
Hea
der
FCoE
H
ead
er
FC
Hea
der
FC Payload
CR
C
EOF
FCS
FC Frame
16 © Copyright 2015 EMC Corporation. All rights reserved.
FCOE INITIALIZATION
• Native FC link: Optical fiber has 2 endpoints (simple) – Discovery: Who’s at the other end? – Liveness: Is the other end still there?
• FCoE virtual link: Ethernet LAN or VLAN, 3+ endpoints possible – Discovery: Choice of FCoE switches – Liveness: FCoE virtual link may span multiple Ethernet links
• Single link check isn’t enough - where’s the problem?
• FCoE configuration: Do mini jumbo (or larger) frames work? • FIP: FCoE Initialization Protocol
– Discover endpoints, create and initialize virtual link with FCoE switch – Mini jumbo frame support: Large frame is part of discovery – Periodic LKA (Link Keep Alive) messages after initialization
ETHERNET IS MORE THAN A CABLE
17 © Copyright 2015 EMC Corporation. All rights reserved.
FCOE AND ETHERNET STANDARDS
Fibre Channel over Ethernet (FCoE)
– Developed by INCITS T11 Fibre Channel (FC) Interfaces Technical Committee
– Enables FC traffic over Ethernet – FC-BB-5 standard: June 2009 – FC-BB-6 standard: October 2014
Data Center Bridging (DCB) Ethernet
• Developed by IEEE Data Center Bridging (DCB) Task Group
• Drops frames as rarely as FC • Commonly referred to as
Lossless Ethernet • DCB: Required for FCoE • DCB: Enhancement for iSCSI
Two complementary standards
Participants: Brocade, Cisco, EMC, Emulex, HP, IBM, Intel, QLogic, others
18 © Copyright 2015 EMC Corporation. All rights reserved.
FC-BB-6 – NEW FCOE FEATURES Direct connection of servers to storage
– PT2PT [point to point]: Single cable – VN2VN [VN_Port to VN_Port]: Single Ethernet LAN or VLAN
Better support for FC fabric scaling (switch count) – Distribute logical FC fabric switch functionality – Enables every DCB Ethernet switch to participate in FCoE
More from Connectrix: SAN & IP Storage Networking Technologies &
Best Practice Update Mon 4:30pm and Wed 8:30am
19 © Copyright 2015 EMC Corporation. All rights reserved.
LOSSLESS ETHERNET (DCB) IEEE 802.1 Data Center Bridging (DCB)
Link level enhancements: 1. Enhanced Transmission Selection (ETS) 2. Priority Flow Control (PFC) 3. Data Center Bridging Exchange Protocol (DCBX)
DCB: network portion that must be lossless – Generally limited to data center distances per link – Can use long-distance optics, but uncommon in practice
DCB Ethernet provides the Lossless Infrastructure that enables FCoE. DCB also improves iSCSI.
20 © Copyright 2015 EMC Corporation. All rights reserved.
ENHANCED TRANSMISSION SELECTION • Management framework for link bandwidth
• Priority configuration and bandwidth reservation – HPC & storage traffic:
higher priority, reserved bandwidth
• Low latency for high priority traffic
– Unused bandwidth available to other traffic
DCB PART 1: IEEE 802.1QAZ [ETS]
Offered Traffic
t1 t2 t3
Link Utilization (10Gig link)
3G/s HPC Traffic 3G/s
2G/s
3G/s Storage Traffic 3G/s
3G/s
LAN Traffic 4G/s
5G/s 3G/s
t1 t2 t3
3G/s 3G/s
3G/s 3G/s 3G/s
2G/s
3G/s 4G/s 6G/s
21 © Copyright 2015 EMC Corporation. All rights reserved.
Switch A Switch B
PAUSE AND PRIORITY FLOW CONTROL • PAUSE can produce lossless Ethernet behavior
– Original 802.3x PAUSE affects all traffic: rarely implemented
• New PAUSE: Priority Flow Control (PFC) – Pause per priority level – No effect on other traffic – Creates lossless virtual lanes
• Per priority flow control – Enable/disable per priority – Only for traffic that needs it
DCB PART 2: IEEE 802.1QBB & 802.3BD [PFC]
22 © Copyright 2015 EMC Corporation. All rights reserved.
DCB CAPABILITY EXCHANGE
• Ethernet Link configuration (single link)
– Extends Link Layer Discovery Protocol (LLDP)
• Reliably enables lossless behavior (DCB) – e.g., exchange Ethernet priority values for FCoE and FIP
• FCoE virtual links should not be instantiated without DCBX
DCB PART 3: IEEE 802.1QAZ [DCBX] FCoE/FC Switches
DCB Ethernet FC SAN Server
DCBX
23 © Copyright 2015 EMC Corporation. All rights reserved.
ETHERNET SPANNING TREES Reminder: FCoE is Ethernet only, no IP routing
– Ethernet (layer 2) is bridged, not routed Spanning Tree Protocol (STP): Prevents (deadly) loops
– Elects a Root Switch, disables redundant paths Causes problems in large layer 2 networks (for both FCoE and iSCSI)
– No network multipathing – Inefficient link utilization
SiSiSiSi
SiSi SiSi SiSiSiSi SiSi
Root Switch
24 © Copyright 2015 EMC Corporation. All rights reserved.
SiSiSiSi
SiSi SiSi SiSiSiSi SiSi
ETHERNET MULTIPATHING: SPBM AND TRILL
SPBM = Shortest Path Bridging-MAC [IEEE 802.1aq]
TRILL = Transparent Interconnection of Lots of Links [IETF RFC 6325]
Layer 2 routing for Ethernet switches – Encapsulate Ethernet traffic, use IS-IS routing protocol – Block Spanning Tree Protocol
Transparent to NICs
All links active
25 © Copyright 2015 EMC Corporation. All rights reserved.
AGENDA
• Network Convergence • Protocols & Standards • Solution Evolution • RDMA-based Fabrics • Flash: NVMe and Fabrics • Conclusion
26 © Copyright 2015 EMC Corporation. All rights reserved.
FCOE AND ISCSI
FCoE
FC expertise / install base FC management Layer 2 Ethernet
Use FCIP for distance
Ethernet Leverage
Ethernet/IP expertise 10 Gigabit Ethernet Lossless Ethernet
iSCSI
No FC expertise needed Supports distance
connectivity (L3 IP routing) Strong virtualization affinity
27 © Copyright 2015 EMC Corporation. All rights reserved.
ISCSI DEPLOYMENT • 10 Gb iSCSI solutions
– Traditional Ethernet (recover from dropped packets using TCP) or
– Lossless Ethernet (DCB) environment (TCP still used)
• iSCSI: natively routable (IP) – Can use VLAN(s) to isolate
traffic
• iSCSI: smaller scale solutions – Larger SANs: usually FC
(e.g., for robustness, management)
Ethernet iSCSI SAN
28 © Copyright 2015 EMC Corporation. All rights reserved.
SOME ISCSI BEST PRACTICES
• Use a separate VLAN for iSCSI – Provides direct visibility and control of iSCSI traffic
• Avoid mixing Ethernet speeds (e.g., 1 and 10 Gb/sec) in network – Congestion can occur at switch where speed changes
• DCB (lossless) Ethernet helps iSCSI, but is not a panacea – E.g., Still shouldn’t mix different Ethernet speeds (bandwidths)
TOP TIPS FROM E-LAB
29 © Copyright 2015 EMC Corporation. All rights reserved.
CONVERGENCE: SERVER PHASE Converged Switch at top of rack or end of row
– Tightly controlled solution – Server 10 GE adapters: CNA or NIC
iSCSI and FCoE via Converged Switch
FC HBAs NICs
Converged Switch
Rack Mount Servers
10 GbE CNAs
FC Attach
Ethernet LAN
Storage
Fibre Channel SAN
Ethernet FC
iSCSI
30 © Copyright 2015 EMC Corporation. All rights reserved.
CONVERGENCE: NETWORK PHASE Converged Switches move out of rack
FCoE: Multi-hop, may be end-to-end
Maintains existing SAN/network management Overlapping admin domains may compel cultural adjustments
Converged Switch
10 GbE CNAs
Ethernet LAN
Storage
Fibre Channel SAN
Ethernet FC/FCoE
Ethernet Network (IP, FCoE) and CNS
Rack Mount Servers
31 © Copyright 2015 EMC Corporation. All rights reserved.
CONVERGENCE: RESULTS Two paths to a Converged Network
– iSCSI: purely Ethernet – FCoE: mix FC and Ethernet (or all Ethernet)
▪ FC compatibility now and in the future
Choose (one or both) on scalability, management, and skill set
10 GbE CNAs
Ethernet LAN
FC & FCoE SAN
iSCSI/FCoE Storage
Ethernet FC/FCoE
Fibre Channel & FCoE attach
Rack Mount Servers
Converged Switch
32 © Copyright 2015 EMC Corporation. All rights reserved.
AGENDA
• Network Convergence • Protocols & Standards • Solution Evolution • RDMA-based Fabrics • Flash: NVMe and Fabrics • Conclusion
33 © Copyright 2015 EMC Corporation. All rights reserved.
DMA = DIRECT MEMORY ACCESS Improvement on Programmed I/O
Used by FC HBAs and FCoE CNAs Server
CPU
I/O
RAM
Programmed I/O via CPU
Server
CPU
I/O
RAM
Direct Memory Access (DMA) I/O
34 © Copyright 2015 EMC Corporation. All rights reserved.
REMOTE DIRECT MEMORY ACCESS (RDMA)
Connect DMA over network or other interconnect – Requires RDMA I/O hardware
Server
CPU
I/O
RAM
Server
CPU
I/O
RAM
Remote Direct Memory Access (RDMA)
35 © Copyright 2015 EMC Corporation. All rights reserved.
Remote Direct Memory Access (RDMA)
RDMA AND STORAGE Storage Systems often have CPU/RAM/etc.
RDMA can optimize message and data transfer
Server
CPU
I/O
RAM
CPU
I/O
RAM
Storage System
36 © Copyright 2015 EMC Corporation. All rights reserved.
Remote Direct Memory Access (RDMA)
RDMA AND ISCSI Storage Systems often have CPU/RAM/etc.
RDMA can optimize message and data transfer
Server
CPU
I/O
RAM
CPU
I/O
RAM
Storage System
Link
IP
TCP
iSCSI
SCSI
37 © Copyright 2015 EMC Corporation. All rights reserved.
Remote Direct Memory Access (RDMA)
RDMA AND ISCSI: ISER iSER: iSCSI Extensions for RDMA
iSER optimizes (and extends) iSCSI for RDMA Server
CPU
I/O
RAM
CPU
I/O
RAM
Storage System
RDMA
iSER
iSCSI
SCSI
38 © Copyright 2015 EMC Corporation. All rights reserved.
ISER (ISCSI EXTENSIONS FOR RDMA) Based on and extends iSCSI
– Standardized in 2007 (IETF RFC 5046) – NEW standard (match implementations): RFC 7145 (2014)
Supports multiple RDMA protocols – InfiniBand (iSER: preferred block storage protocol)
▪ Requires InfiniBand adapters and cables – RoCE: RDMA over Converged [DCB] Ethernet
▪ InfiniBand protocol stack, rehosted on DCB Ethernet ▪ RoCE Requires DCB Ethernet (like FCoE) ▪ RoCEv2 adds IP routing (aka Routable RoCE) via UDP encapsulation
– iWARP: RDMA over TCP/IP ▪ DCB Ethernet helps (like iSCSI), but is not required
39 © Copyright 2015 EMC Corporation. All rights reserved.
AGENDA
• Network Convergence • Protocols & Standards • Solution Evolution • RDMA-based Fabrics • Flash: NVMe and Fabrics • Conclusion
40 © Copyright 2015 EMC Corporation. All rights reserved.
HOT DATA MIGRATING TOWARD CPU
41 © Copyright 2015 EMC Corporation. All rights reserved.
NEXT-GEN NON VOLATILE MEMORY (NVM)
42 © Copyright 2015 EMC Corporation. All rights reserved.
NVM EXPRESS – ARCHITECTED FOR NVM
• Standardized high performance software interface for PCIe SSDs - Standard register, feature, command sets: Replaced proprietary PCIe solutions - Architected for current and future low-latency Non-Volatile Memory technologies - Designed to scale from Client to Enterprise systems
• Developed by an open industry consortium
43 © Copyright 2015 EMC Corporation. All rights reserved.
NVM EXPRESS TECHNICAL OVERVIEW • Supports deep queues (64K commands per queue, up to 64K queues) • Streamlined & simple command set (13 required commands) • Optional features to address specific environments
- Data Center: End-to-end data protection, reservations, etc. - Client: Autonomous power state transitions, etc.
NVM Express* (NVMe)
44 © Copyright 2015 EMC Corporation. All rights reserved.
NVM EXPRESS - LATENCY • Simplicity leads to speed: more than 200 µs faster (lower latency) than 12 Gb SAS
NVMe: lowest latency of any standard storage interface. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
45 © Copyright 2015 EMC Corporation. All rights reserved.
NVM EXPRESS IN FABRIC ENVIRONMENTS
• NVM Express (NVMe) Flash Appliance
• Hundreds or more SSDs – too many for PCI Express
Goal: Best SSD performance and latency over fabrics: Ethernet, InfiniBand™, Fibre Channel, etc.
• Concern: Remote SSD attach over a fabric uses SCSI based protocols today – requiring protocol translation(s)
46 © Copyright 2015 EMC Corporation. All rights reserved.
INTRODUCING NVME OVER FABRICS
Extend efficiency of NVMe over front and back-end fabrics. Goal: Remote NVMe equivalent to local NVMe (max ~10 µs added latency)
47 © Copyright 2015 EMC Corporation. All rights reserved.
ARCHITECTURAL APPROACH • The NVM Express (NVMe) Workgroup
has started the definition of NVMe over Fabrics
• A flexible transport abstraction layer will enable consistent definition of NVMe over different fabric types
• First fabric definition: RDMA family – for Ethernet (iWARP and RoCE) and InfiniBand™
• More fabric definitions planned – e.g., Fibre Channel, Intel® Omni-Scale
NVMe Controller
NVMe Host Software
Host Side Transport Abstraction
Controller Side Transport Abstraction
PCIe* Host Software
PCIe Function
PCIe Fabric
RDMA Host SW
RDMA Target
RDMA Fabrics
Future Fabric Host
SW
Future Fabric Target
Future Fabric
48 © Copyright 2015 EMC Corporation. All rights reserved.
AGENDA
• Network Convergence • Protocols & Standards • Solution Evolution • RDMA-based Fabrics • NVMe, NVMe over Fabrics • Conclusion
49 © Copyright 2015 EMC Corporation. All rights reserved.
CONCLUSION Converged data centers
– FCoE: Compatible with continued use of FC – iSCSI solutions work well for all IP/Ethernet networks
Ethernet solutions: FCoE, iSCSI – Standards enable integration into existing data centers – Ethernet roadmap: 25, 40, 50 and 100 Gbit/sec speeds
Native Fibre Channel roadmap: 32GFC and 128GFC speeds – 32GFC/128GFC comparable to 25/100 Gbit/sec Ethernet
Fabrics: Emerging Ethernet technologies (RoCE, iWARP) – iSER: iSCSI Extensions for RDMA – NVM Express (NVMe) over Fabrics, e.g., RDMA-based Fabrics
50 © Copyright 2015 EMC Corporation. All rights reserved.
RELATED SESSION AND RESOURCES SAN & IP Storage Networking Technologies & Best Practice Update
– Monday, 4:30pm and Wednesday, 8:30am Birds of a Feather – FC SAN: Should I stay or should I go…
– Wednesday, 1:30pm
EMC Support Matrix – http://elabnavigator.emc.com
EMC FCoE Introduction whitepaper – http://www.emc.com/collateral/hardware/white-papers/h5916-intro-to-fcoe-wp.pdf
Storage Networking Blog by Erik Smith (Connectrix) – http://www.brasstacksblog.typepad.com
NVM Express Information – Visit http://www.nvmexpress.org
51 © Copyright 2015 EMC Corporation. All rights reserved.
Q&A