Post on 31-Mar-2020
Harry YudenfriendIBM Fellow harryy@us.ibm.com
March 3, 2015
Share Session 16896: IBM z13 and DS8870 I/O
Innovations for z Systems
© 2015 IBM Corporation 2
TrademarksThe following are trademarks of the International Business Machines Corporation in the United States and/or other countries.
BladeCenetr*BlueMixCICS*COGNOS*DB2*
HiperSocketsHyperSwapIBM*IBM (logo)*Infinband*
DFSMSDFSMSdfpDFSMSdssDFSMShsmDS8000*
* Registered trademarks of IBM Corporation
Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.This information provides only general descriptions of the types and portions of workloads that are eligible for execution on Specialty Engines (e.g, zIIPs, zAAPs, and IFLs) ("SEs"). IBM authorizes customers to use IBM SE only to execute the processing of Eligible Workloads of specific Programs expressly authorized by IBM as specified in the “Authorized Use Table for IBM Machines” provided at www.ibm.com/systems/support/machine_warranties/machine_code/aut.html (“AUT”). No other workload processing is authorized for execution on an SE. IBM offers SE at a lower price than General Processors/Central Processors because customers are authorized to use SEs only to process certain types and/or amounts of workloads as specified by IBM in the AUT.
The following are trademarks or registered trademarks of other companies.Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. Java and all Java based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. andLinux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. OpenStack is a trademark of OpenStack LLC. The OpenStack trademark policy is available on the OpenStack website.TEALEAF is a registered trademark of Tealeaf, an IBM Company.Windows Server and the Windows logo are trademarks of the Microsoft group of countries.Worklight is a trademark or registered trademark of Worklight, an IBM Company.UNIX is a registered trademark of The Open Group in the United States and other countries.* Other product and service names might be trademarks of IBM or other companies.
Easy Tier*ECKDFlashSystemFICON*GDPS*
IMSMQSeries*NetView*OMEGAMON*RACF*
System Storage*Tivoli*WebSphere*z13zEnterprise*
z/OS*z Systemsz/VM*z/VSE*
© 2015 IBM Corporation 3
z13 I/O Subsystem Enhancements with the DS8870• GOALS
• Performance– Substantial throughput and latency improvements
for database log writes – Measureable I/O latency reduction for
transactions • Batch Window Reduction
– Higher I/O throughput and lower latency with the same HW footprint, cabling infrastructure and architectural addressing limits
• Scale– More devices per channel, larger devices– More logical channel subsystems and LPARs– Higher I/O throughput and lower latency
• Resilience– Reduce impact to production work when I/O
components fail– Reduction in false repair actions– Fast identification of the source of SAN errors– Self Diagnosing– Reduce the chance for human error– Simplify migration to new machines for FCP users
Supporting Technologies Managed File Transfer
Acceleration zHyperWrite for DB2 New Tooling for I/O Resilience FICON Express16S Forward Error Correction Codes Read Diagnostic Parameters FICON Dynamic Routing Fabric I/O Priority zHPF Extended Distance II SPoF Elimination w/Storage 32K UA/Channel, 6 LCSS, 4 SS
Managed File Transfer
© 2015 IBM Corporation 5
IBM Sterling Connect:Direct and z/OS Enhancements for Managed File Transfer• Lowering the cost of moving data
• Send the data over a DASD bridge vs. Ethernet network• Connect Direct with DS8000 zDDB and new z/OS services• Client PoC complete
• IBM goal is to improve data transfer between z/OS images and distributed platforms
• IBM expects a 2x improvement in CPU cost and significant reduction in elapsed time for Connect Direct vs. TCP/IP
• Based on z/OS and Connect Direct performance testing
• IBM goal is to improve data transfer between z/OS images and distributed platforms
• z/OS to z/OS transfers GA was March 2014• z/OS to AIX transfers GA was June 2014
2x CPU Improvement
30+% Elapsed Time Reduction
z/OS
D81HMY1.HMY.FOOBAR /hmy/foobarFICON SCSI
DS8870
AIX
FICON
March 2014
June 2014
zHyperWriteDB2 Log Write Acceleration
© 2015 IBM Corporation 7
DB2 Log Throughput is Important
• Improved DB2 Log Performance
• Allows running more work to get done with same hardware footprint
• Better able to handle workload spikes, including work generated by Mobile
• Enhances System z resilience
• Reduces costs
• Faster transactions are good for business
© 2015 IBM Corporation 8
SAP/DB2 Transactional Latency on z/OS
• How do we make transactions run faster on z Systems and z/OS?A banking workload running on z/OS:
DB2 Server time: 5%Lock/Latch + Page Latch: 2-4%Sync I/O: 60-65%Dispatcher Latency: 20-25%TCP/IP: 4-6%
This is the write to the DB2 Log
Lowering the DB2 Log Write Latency will accelerate transaction execution and reduce lock hold times
1. Faster CPU2. Software scaling, reducing contention, faster I/O3. Faster I/O technologies such as zHPF, 16 Gbs, zHyperWrite, zHPF ED II, etc…4. Run at lower utilizations, address Dispatcher Queueing Delays5. RoCE Express with SMC-R
© 2015 IBM Corporation 9
DB2 Log Throughput is Important
• Mitigate the latency impact of synchronous replication technologies such as Metro Mirror (PPRC)
• Client Value From Reduction in DB2 Log Write Latency• Improved DB2 transactional latency• Log throughput improvement• Additional headroom for growth• Improved resilience for workload spikes• Cost savings from workload consolidation
© 2015 IBM Corporation 10
Control Unit Based Synchronous Data Replication
Database DatabaseDB2 LogDSN2
DB2 LogDSN2
z System
FCP
Up to 330 microseconds of
added latency, plus 10 microseconds per
km distance
Typically, one millisecond I/O service time for
log write operation
Multi-TargetSecondary
DB2 LogDSN2
PPRC
Multi-TargetSecondary
PPRC
z System z System z System
© 2015 IBM Corporation 11
IBM zHyperWrite - Hybrid Data Replication
Database DatabaseDB2 LogDSN2
DB2 LogDSN2
PPRC
Up to three write
operations executed in
parallel
Total elapsed time is the MAX
response time of up to three operations running in parallel
PPRC still used for other I/O operations
(non-log write)
DB2 LogDSN2
PPRC
Multi-TargetSecondary
Multi-TargetSecondary
Design Characteristics:
1. Requires GDPS or TPC-R HyperSwap to initialize tables for PPRC relationships
2. zHyperWrite coexists transparently with Hyperswap
3. Supports Multi-Target PPRC
Optimal performance for long distance leg provided by zHPF Extended Distance II, available with z13
z System z System z System z System
© 2015 IBM Corporation 12
00.20.40.60.8
11.21.4
0 1 2 3 4 5
Mill
isec
onds
# Log CI Created Per Commit
Average commit timeLog replication at zero distance
No zHyperWrite zHyperWrite
40% reduction in Commit response time
Jeffrey Berger
zHyperWrite Performance (IBM Testing)
© 2015 IBM Corporation 13
zHyperWrite with DB2 Batch Updates (Client X)
0 1000 2000 3000 4000 5000 6000
HyperWrite - OFF
HypweWrite - ON
Time (second)
CL2 CPUUpdate CommitLock latchLog writeNot accountOthers
-28% -28%
• An ESP client running 2.7 millions update (commit every 3 updates) • 28% DB2 elapsed time improvement by reducing log write wait
during update commit, 26% Job elapsed time improvement.
Decrease in CPU time due to reduction in commit time, which reduces lock hold time which in turn reduces lock contention wait time,
HyperWrite - ON
© 2015 IBM Corporation 14
zHyperWrite Improves DB2 Batch Updates (Client Y)
0 10 20 30 40 50
HyperWrite - Off
HyperWrite - On
Time per commit (ms)
LOCK/LATCH(DB2+IRLM)
DATABASE I/O
LOG WRITE I/O
OTHER READ I/O
OTHER WRTE I/O
UPDATE COMMIT
GLOBAL CONT.
CL2 CPU
NOT ACCOUNTED
• A customer running multiple updates jobs with 20 updates per commit• Non controlled environment
• 43% Reduction in Update Commit time and 40% DB2 elapsed time reduction
-43% -43%
© 2015 IBM Corporation 15
zHyperWrite Results – Local Distance (0 KM)Log Write Size I/O Latency w/o
zHyperWrite (ms)
I/O Latency with zHyperWrite
(ms)
Latency Improvement
Projected Throughput*
8K FICON .692 .385 44% 179%8K zHPF .573 .325 43% 175%
16K FICON .726 .423 41% 169%16K zHPF .601 .358 40% 167%
32K FICON .843 .513 39% 164%32K zHPF .746 .454 39% 164%
64K FICON 1.18 .683 47% 189%64K zHPF .873 .576 34% 152%
128K FICON 1.46 1.14 22% 128%128K zHPF 1.14 .843 26% 135%
256K FICON 2.12 1.79 15% 118%256K zHPF 1.66 1.38 16% 119%
© 2015 IBM Corporation 16
zHyperWrite for z/OS, DB2 and DS8870
• New zHyperWrite function for DB2, z/OS and DS8870 with GDPS or TPC-R HyperSwap • Now GA (December 2014)• Designed to help accelerate DB2 Log Writes
– Benefits include: Improved DB2 transactional latency Log throughput improvement Additional headroom for growth Improved resilience for workload spikes Potential cost savings from workload consolidation
• Response time reduced up to 43%, throughput up to 180%– Benefit percentage varies with distance– Requires:
zHyperWrite function in z/OS 2.1, with the PTF for APAR OA45662 DB2 10 and DB2 11 SPE IBM DS8870 7.4
System z I/O Exerciser
© 2015 IBM Corporation 18
IBM System z I/O Exerciser Tool• Problem
• After major client upgrades (processor or storage) FICON connections are not found to be faulty until the production z/OS work load is run
• Solution• Provide a way to verify quality of the cable connections before running z/OS
production work
• IBM System z I/O Exerciser • https://www.ibm.com/services/forms/preLogin.do?source=swg-beta-
ibmioexzosNew tool made available March 4, 2014
• Runs in a stand-alone LPAR or z/VM Guest Machine• Tests all the FICON devices available to that partition via the IOCDS
• Follow me on twitter: @HMYudenfriend
© 2015 IBM Corporation 19
Initial Screen
© 2015 IBM Corporation 20
Example Output with Errors
FICON Express16s
© 2015 IBM Corporation 22
1200 14000
31000
20000
52000
2300023000
9200098000
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000I/O driver benchmark I/Os per second4k block sizeChannel 100% utilized
FICONExpress4
andFICON
Express2
zHPF
FICON Express8
zHPF
FICON Express
8
FICONExpress4
andFICON
Express2
ESCON
zHPF
FICON Express8S
FICON Express8S
z10 z10z196z10
z196z10
zEC12zBC12
z196,z114zEC12zBC12
z196,z114
350520
620770
620 620
1600
2600
0200400600800
10001200140016001800200022002400260028003000
FICON Express44 Gbps
I/O driver benchmarkMegaBytes per secondFull-duplexLarge sequentialread/write mix
FICONExpress44 Gbps
FICON Express88 Gbps
FICON Express88 Gbps FICON
Express8S8 Gbps
FICON Express8S
8 Gbps
z10 z10z196z10
z196z10
zEC12zBC12
z196,z114
zHPF
zHPF
zHPF
zEC12zBC12
z196,z114
z13
zHPF
FICON Express
16S
z13
FICON Express
16S
z13z13
FICON Express
16S16 Gbps
FICON Express
16S16 Gbps
zHPF
63% increase
*This performance data was measured in a controlled environment running an I/O driver program under z/OS. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed.
6.5% increase
FICON Performance History
© 2015 IBM Corporation 23
Leadership with FICON Express16s
• Faster Link Speed yields latency improvements for large block transfers• z/OS DB2 Writes to the LOG• Latency reduction in large log writes• Yields improvement in Transaction Latency, depending on workload
• z/OS Managed File Transfer • DS8000 zDDB feature for exchanging data through the SAN• New z/OS IOS FBA Access Method • New Connect:Direct exploitation - 16 MB reads/writes• Lower CPU cost with zDDB adds flexibility for when managed file transfer
(MFT) is done• Reduce the Batch Window
• ISV’s (e.g. 1 MB/SSCH)• DSS backup/restore
© 2015 IBM Corporation 24
FICON Express 16s Performance
z13 with FICON Express16s and the DS8870 16 Gbs HBAs running 256K read/write mix will have up to 32% lower I/O latency than the EC12 with FICON Express 8s and 8 Gbs HBAs in the DS8870.
0.000 0.200 0.400 0.600 0.800 1.000 1.200 1.400 1.600
zEC12 FEx8S 8Gb HBA
z13 FEx8S 8Gb HBA
z13 FEx16S 8Gb HBA
z13 FEx8S 16Gb HBA
z13 FEx16S 16Gb HBA
zEC12 FEx8S 8Gb HBA z13 FEx8S 8Gb HBA z13 FEx16S 8Gb HBA z13 FEx8S 16Gb HBA z13 FEx16S 16Gb HBAPEND 0.224 0.143 0.120 0.109 0.083DISC 0.055 0.000 0.000 0.000 0.000CONN 1.186 1.075 1.019 0.921 0.918
Multi-Stream 64x4K Read/Write
© 2015 IBM Corporation 25
z13 with FICON Express16s and the DS8870 16 Gbs HBAs running single stream 4K read will have up to 21% lower I/O latency than the EC12 with FICON Express 8s and 8 Gbs HBAs in the DS8870.
z13 FEx16S 16G HBA zEC12 FEx8S 8G HBA
zHPF Read 0.122 0.155
zHPF Write 0.143 0.180
FICON Read 0.185 0.209
FICON Write 0.215 0.214
0.000
0.050
0.100
0.150
0.200
0.250
Single Channel 4K 1 Devicez13 FEx16S 16G HBA vs zEC12 FEx8S 8G HBA
Res
pons
e Ti
me
(mse
c)
FICON Express16s Performance
© 2015 IBM Corporation 26
z13 with FICON Express16s and the DS8870 16 Gbs HBAs running 32 streams of 4K read will have up to 54% lower I/O latency than the EC12 with FICON Express 8s and 8 Gbs HBAs in the DS8870.
FICON Express 16s Performance
0.000 0.050 0.100 0.150 0.200 0.250 0.300 0.350
zEC12 FEx8S 8Gb HBA
z13 FEx8S 8Gb HBA
z13 FEx16S 8Gb HBA
z13 FEx8S 16Gb HBA
z13 FEx16S 16Gb HBA
zEC12 FEx8S8Gb HBA
z13 FEx8S 8GbHBA
z13 FEx16S 8GbHBA
z13 FEx8S 16GbHBA
z13 FEx16S16Gb HBA
PEND 0.209 0.181 0.162 0.113 0.092CONN 0.124 0.121 0.115 0.060 0.060
Mutli-Stream 4K Read
© 2015 IBM Corporation 27
32x4K Write Latency 1 Stream (less is better)
0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900
zEC12 FEx8S zHPF Write 8Gb HBA
z13 FEx8S zHPF Write 8Gb HBA
z13 FEx16S zHPF Write 8Gb HBA
z13 FEx16S zHPF Write 16Gb HBA
PEND CONN
-23%
-14%
-15%
DB2 log write latency for large commits can be reduced up to 23%.When combined with zHyperWrite, total log write latency can be reduced up to 66%, or two thirds.
© 2015 IBM Corporation 28
FICON Express16S - IMS Latency LSPR IMS workload
0.000
0.050
0.100
0.150
0.200
0.250
0.300
0.350
zEC12 FEx8S attached to 8Gb HBA z13 FEx16S attached to 8Gb HBA (WADSattached to 16Gb HBA)
WADS Response Time DASD Response Time
-9%
-10%
© 2015 IBM Corporation 29
FICON Express 16s Performance
(Controlled measurement environment, results may vary)
• The more constrained the channels are, the more 16Gb/sec links help.• Both the channels and the HBA need to be 16Gb/sec to get the most benefit.
http://www.redbooks.ibm.com/abstracts/redp5135.html?Open
0
200
400
600
800
1000
1200
1400
1 channel 2 channel 3 channel 4 channel 8 channel
MB
/sec
DB2 LOAD Utility - 8 partitions with parallelism
zEC12 DS8870 z13 Fex16S, 8G HBA z13 FEx16S, 16G HBA
I/O Throughput (More is better)
+72%
+48%
+25%+46%+52%
© 2015 IBM Corporation 30
Strategy
• Improve the client experience when transitioning to faster FC link technologies by:
• Eliminating/reducing errors that will occur with new link technologies• Differentiate between failures that occur because of faulty optics and failures that
occur because of dirty or faulty cables or links– Gather history data that will allow validation of future Predictive Failure Analysis
algorithms for future IT analytics• Extend z/OS SAN Health Checks to surface FC switch components, e.g. inter
switch links, HBAs, etc., that are degrading and surface the information to the z/OS client
• Provide additional tooling to surface poor quality links without needing to run z/OS production workload (see I/O exerciser tool)
© 2015 IBM Corporation 31
Prevent I/O Errors with Forward Error Correction (FEC) Codes• New standard for transmission of data on 16 Gbs Links
• T11.org FC-FS-3 standard defines use of 64b/66b encoding– Efficiency improved to 97% vs. 80% with 8b/10b encoding
• FEC codes provide error correction on top of 64b/66b encoding– Improves reliability by reducing bit errors (adds equivalent of 2.5 db signal strength)– Up to 11 bit errors per 2112 bits can be corrected– IBM is leading new standards required to enable FEC for optical links
FICON Channel
FICON Channel
…
DASD
Tape
ISL
CUP
Proprietary Brocade Fabrics use FEC today to improve reliability of ISL connections in the FC fabric for 16 Gbs link
z13 and DS8870 will extend the use of FEC to the fabric N_Ports for complete end-to-end
coverage for new 16 Gbs FC links
FCP Channel
© 2015 IBM Corporation 32
Forward Error Correction Codes - Value
FEC improves the bit error rate to the same amount *as if* the signal were 2.5 dB stronger, relative to the fixed amount of receiver noise.
© 2015 IBM Corporation 33
FEC Video Demo http://youtu.be/OGuzeSdnEp8
© 2015 IBM Corporation 34
Read Diagnostic Parameters
• Automatically differentiate between errors caused by dirty links and those errors caused by failing optical components
• Automatically identify the failing link
• New health checks for:• End to end link speeds• Link speeds across CHPIDs to a control unit
• z/OS Commands to display optical signal strength and other metrics without having to manually inserting light meters.
• Keep a history of diagnostic parameters for future IT Analytics
© 2015 IBM Corporation 35
Improved Fault Isolation with Read Diagnostic Parameters• After an link error is detected (e.g. IFCC, CC3, reset event, link incident
report), use link data returned from Read Diagnostic Parameters to differentiate between errors due to failures in the optics versus failures due to dirty or faulty links.• New System z Channel Subsystem Function
– Periodic polling from the channel to the end points for the logical paths established– Improved fault isolation, key metrics displayable on operator console– Reduces the number of useless Repair Actions (RA)
FICON Channel
FICON Channel
…
DASD
Tape
ISL
CUP
UCK, SNS=HEALTH CHECK
DIAGNOSTIC READ
READ OPTICAL POWER ELSREAD LESB
FICON Switches responsible for detecting
internal ISL failures, signal strength
degradation and link error counts
LIR – BER, Link Failure
READ OPTICAL SIGNAL POWER ELS + READ LESB
z13 GA2New z/OS Link Health
Check to report on inconsistent link speeds,
high error counters, metrics that are out of
spec.
New z/OS Link Health Check to report on
inconsistent link speeds, high error counters,
metrics that are out of spec.
© 2015 IBM Corporation 36
New Display Matrix Commands
D M=DEV(8010,(72)),LINKINFO=CURR IEE583I 16.15.59 DISPLAY M 628 DEVICE 08010 STATUS=ONLINE
Link Information (Current): Entry Exit Control
Description Chan Port Port Unit Identifier pchid ddaa ddaa intidCapable Speed 16G 16G 16G 8G Operating Speed 16G 16G 8G 8GTx Bias (mA) 20 17 22 25 Tx Power (dBm) -1.5 -1.7 -2.4 -1.9 Rx Power (dBm) -1.2 -0.9 -2.0 -1.3 Temperature (C) 40 38 25 29 Voltage (V) 3.25 2.93 1.74 1.96
© 2015 IBM Corporation 37
New Display Matrix CommandsD M=DEV(8010,(72)),LINKINFO=CURR,COMPARE IEE583I 16.15.59 DISPLAY M 628 DEVICE 08010 STATUS=ONLINE
Link Information Comparison:
Channel Information: CHPID=nn, PCHID=nnnnDescription IPL Prev CurrDate 030.2015 042.2015 042.2015 Time 07:01:44 hh:mm:ss 13:29:10 Capable Speed 16G 16G 16GOperating Speed 16G 16G 8G Tx Bias (mA) 20 17 22 Tx Power (dBm) -1.5 -1.7 -2.4 Rx Power (dBm) -1.2 -0.9 -2.0 Temperature (C) 40 38 25 Voltage (V) 3.25 2.93 1.74
Switch Entry Port Information: Link=ddaa...repeat above lines...
FICON Dynamic Routing
© 2015 IBM Corporation 39
Static Routing (Port Based Routing, PBR)
ISLsChannels
ISLs assigned to channel at fabric
login time
Same ISL is used regardless of destination
Port Based Routing (PBR) assigns the ISL (route) statically based on “First Come, First Served” at fabric login (FLOGI) time The ISL is assigned “Round Robin” as ports log in Switch has no idea how the port that is logging-in will use the ISL and whether it will cause
bottlenecks This can result in some ISLs overloaded, some under utilized Allow/Prohibit manual controls are too complicated to manage Routing tables change after every POR System z channel selection algorithm (zEC12) will move work away from congested ISLs
© 2015 IBM Corporation 40
Static Routing (Device Based Routing, DBR)
ISLsChannels
ISLs assigned based on source and destination port
Device Based Routing (DBR) assigns the ISL (route) statically based on a hash of the source and destination port. Same as CISCO default routing. Spreads load across ISLs much better than PBR, but no guarantee that the same ISL won’t
be assigned System z channel selection algorithm (zEC12) will move work away from congested ISLs In some configurations, ISLs may go unused – e.g., four PPRC ports in a fabric with 8 ISLs
© 2015 IBM Corporation 41
Dynamic Routing (Exchange Based Routing, OxID)
ISLsChannels
ISL assigned at I/O request time
I/Os for same source and destination port use different ISLs
Dynamic Routing (Brocade EBR or CISCO OxID) dynamically changes the routing between the channel and control unit based on the “Fibre Channel Exchange ID” Each I/O operation has a unique exchange id Client Value:
– Reduces cost by allowing sharing of ISLs between FICON, FCP (PPRC or distributed)– I/O traffic is better balanced between all available ISLs– Improves utilization of switch and ISL hardware – ~37.5% bandwidth increase– Easier to manage– Easier to do capacity planning for ISL bandwidth requirements– Predictable, repeatable I/O performance– Positions FICON for future technology improvements, such as work load based routing
Fabric Priority
© 2015 IBM Corporation 43
Work Load Manager and I/O Priorities Today
Host
Cage4
Cage3
Switch
Switch
ControlUnit
Encl2
Encl1
Cage2
Cage1
UCBs
UCBs
UCBs
z/OS Software
Channel SubsystemFICON Channel
CU Ports
DS8000 I/O Priority Manager
PAV Assignment
FC
FC
© 2015 IBM Corporation 44
System z Fabric I/O Priority
System z
Cage4
Cage3
Switch
Switch
ControlUnit
Encl2
Encl1
Cage2
Cage1
UCBs
UCBs
UCBs
FC
FC
CU echoes priority back on reads
I/O request
z/OS
Copy Services
WLM assigns priority based on goals
CU also uses priority for writes that require replication (e.g., PPRC FCP based activity)
Channel passes priority in frames for
I/O (fabric uses priority for writes)
z/OS calculates Sysplex wide priority range from all
attached switches
© 2015 IBM Corporation 45
System z Fabric I/O Priority
System z
Cage4
Cage3
Switch
Switch
ControlUnit
Encl2
Encl1
Cage2
Cage1
UCBs
UCBs
UCBs
FC
FC
z/OS
FICON
Will be used to operating system to alleviate or prevent congestion caused by slow
drain devicesWhen a switch or set of ISLs
fail, fabric priority will add resilience by allowing WLM to
manage I/O based on client specified goals
© 2015 IBM Corporation 46
Example ResultsThree services classes, H, M, L, 2 devices in each service class, a mix of 128K and 4K write I/Os.
In terms of I/O activity rate we see: • 18% spread from High to Low.• 12.7% from med to low• 4% from high to med
For I/O service times:• 17% difference from high to low• 5% from high to med• 12% from med to low
HH
MM
LL
zHPF Extended Distance II
© 2015 IBM Corporation 48
zHPF Evolution
2009
2010
2011
DS8300 with R4.1 or above
z10 processor
Multi-track, but <= 64K
Multi-track any sizeExtended Distance I z196 processor >64K transfers
Single domain, single track I/OReads, update writes
Media manager exploitationz/OS R8 and above
DS8700/DS8800 with R6.2
z196 FICON Express 8S
Format writes, multi-domain I/OQSAM/BSAM exploitation
Incorrect Length FacilityList Pre-fetch Optimizer
ISV ExploitationEXCP Support
z/OS R11 EXCPVR
100% of DB2 I/O is now converted to zHPF
© 2015 IBM Corporation 49
Pre-zHPF Extended Distance II – Experimental Results
SecondaryChannel Primary100 m
100 km
zHPF
Channel
5.80
2.25
1.83
I/O Service Time (ms)
256K Writes – Typical of DB2 Utilities, ISV products, etc.
1.86PPRC
FICON Channel SecondaryPrimary100 m 100 kmCCW with
PPRC
SecondaryChannel Primary100 m
100 km
FICON
ChannelCCW No
PPRC
zHPF SecondaryChannel Primary100 m 100 km
PPRCzHPF with
PPRC
zHPF No PPRC
© 2015 IBM Corporation 50
HyperSwap at DistanceC H A N N E L P A T H A C T I V I T Y
IODF = FE CR-DATE: 02/20/2013 CR-TIME: 19.41.50 ACT: ACTIVATE MODE: LPAR CHANNEL PATH UTILIZATION(%)... WRITE(MB/SEC) FICON OPERATIONS ZHPF OPERATIONS
ID TYPE G SHR PART TOTAL ... PART TOTAL RATE ACTIVE DEFER RATE ACTIVE ... 97 FC_S 13 Y 0.66 0.67 ... 116.21 116.21 7.3 1.1 0.0 395.3 2.0 ... zHPF PPRC (06:23)97 FC_S 13 Y 9.18 9.21 ... 40.67 40.67 502.2 1.0 0.0 0.0 0.0 ... CCW PPRC (18:02)97 FC_S 13 Y 9.64 9.68 ... 41.66 41.66 514.2 1.0 0.0 0.0 0.0 ... CCW NOPPRC (17:35)97 FC_S 13 Y 0.29 0.29 ... 48.36 48.36 4.8 1.0 0.0 164.5 2.0 ... zHPF NOPPRC (15:15)
D E V I C E A C T I V I T Y
TOTAL SAMPLES = 60 IODF = FE CR-DATE: 02/20/2013 CR-TIME: 19.41.50 ACT: ACTIVATE
DEVICE AVG AVG AVG AVG AVG AVG AVG AVG
STORAGE DEV DEVICE NUMBER VOLUME PAV LCU ACTIVITY RESP IOSQ CMR DB INT PEND DISC CONN
GROUP NUM TYPE OF CYL SERIAL RATE TIME TIME DLY DLY DLY TIME TIME TIME
SCALE04 9E00 3390A 30051 SK9E00 1.0H 00FB 397.118 2.25 .000 .000 .000 .017 .107 .002 2.15 zHPF PPRC
LCU 00FB 397.118 2.25 .000 .000 .000 .017 .107 .002 2.15
SCALE04 9E00 3390A 30051 SK9E00 1.0H 00FB 502.942 1.86 .000 .000 .000 .025 .118 1.31 .434 CCW PPRC
LCU 00FB 502.942 1.86 .000 .000 .000 .025 .118 1.31 .434
SCALE04 9F00 3390A 30051 SK9F00 1.0H 00FC 511.417 1.83 .000 1.02 .000 .020 1.11 .000 .722 CCW NOPPRC
LCU 00FC 511.417 1.83 .000 1.02 .000 .020 1.11 .000 .722
SCALE04 9F00 3390A 30051 SK9F00 1.0H 00FC 165.469 5.80 .000 1.05 .000 .025 1.14 .000 4.66 zHPF NOPPRC
LCU 00FC 165.469 5.80 .000 1.05 .000 .025 1.14 .000 4.66
zHPF Extended Distance II will halve I/O service time at distance for large write operations:
• Shrink the batch window
• Reduce Lock Hold Time• Meet SLA’s
• Provide premier DR environment
FICON Express 8s
100 KM Distance
Standard IDCAMS Utility to copy DB2 table spaceJob elapsed time
© 2015 IBM Corporation 51
Pre-zHPF Extended Distance II
Pre-HyperSwap
Post-HyperSwap At least four interlocked exchanges at long distance. At 10 km, ~400 usec added to each I/O
(50% penalty).
Primary
Secondary
Channel
DB2 Utilities (256K Write)
PPRC (FCP)PPCR pre-deposit write and stream of
tracks.
zHPF
DB2 Utilities (256K Write)
Primary
Secondary
ChannelzHPF
Interlocked exchanges
Note: DB2 utility writes in V11 are going to 512K, so the disparity will be worse
© 2015 IBM Corporation 52
zHPF Extended Distance II
Pre-HyperSwap
Post-HyperSwap zHPF Extended Distance II will execute most write
operations in one round trip
Primary
Secondary
Channel
PPRC (FCP)PPCR pre-deposit write and stream of
tracks.
zHPFCommand
DataStatus
Primary
Secondary
ChannelzHPF
CommandData
Status
© 2015 IBM Corporation 53
Pre-HyperSwap Effect of zHPF ED II Protocol (Local Distance)
0.000
0.500
1.000
1.500
2.000
2.500
3.000
non-ED II Protocol ED II Protocol
FEx16S zHPF 128x4K Write FEx16S FICON 128x4K Write
FEx16S zHPF VSAM Repro Copy FEx16S FICON VSAM Repro Copy
-7%
-6%
The zHPF Extended Distance II feature also has measureable improvement in latency at 0 distance.
512K write at zero distance yields a 7.4% reduction in I/O service time.
Storage SPoF Elimination
© 2015 IBM Corporation 55
Multi-target Mirroring• Allow a single volumes to be the source for more
than one PPRC relationship
• Provide incremental resynchronisation functionality between target devices
• Use cases include
• Synchronous replication within a datacentre combined with another metro distance synchronous relationship
• Add another synchronous replication for migration without interrupting existing replication
• Allow multi-target Metro Global Mirror as well as cascading for greater flexibility and simplified operational scenarios
• Combine with cascading relationships for 4-site topologies and migration scenarios
• TPC-R (2Q2015) and GDPS (1Q2015) support for Multi-target Metro Mirror
MetroMirror
MetroMirror
H2
H3
H1
© 2015 IBM Corporation 56
Compliments multi-target PPRC by simplifying the configuration changes needed to define 3rd copy of data in large configurations.
zHyperWrite is designed to also work with Multi-target PPRC, day 1.
0.0414
1.0414
2.0414
3.0414
Maintain HyperSwap readiness after the primary or a secondary fails.Device number assignment needs to be simplified:
Primary Devices
Secondary Devices
Primary + Secondary + Tertiary Aliases
Tertiary Copy
SS 0
SS 1
SS 2
SS 3
Logical Volume
Fourth Subchannel Set – TPC-R
Scalability
© 2015 IBM Corporation 58
System z I/O Configuration Scale
• z13 Scale
• 6 Logical Channel Subsystems
• 4 Subchannel Sets per LCSS
• 32K unit addresses per channel
• FICON/zHPF/FCP 16 Gbs with Enterprise Class QoS
Miscellaneous New Function
© 2015 IBM Corporation 60
10GbE RoCE Express – Adapter Virtualization Overview
• In z13, the 10GbE RoCE Express feature becomes shareable among multiple Logical Partitions (LPARs) or z/VM guest virtual machines Follows same virtualization model as zEDC (Compression adapter)
• Up to 16 physical 10GbE RoCE Express adapters (PCHIDs) are supported per CPC (no change from EC12)
• Up to 31 FUNCTION IDs (FIDs), each with a corresponding unique Virtual Function ID (VFs), can be configured for each physical adapter (PCHID) in the IOCDS (using HCD or IOCP)
• Each LPAR or z/VM guest sharing the adapter consumes (at least)one of the available PFIDs (and
corresponding Virtual Function ID) A Function cannot be shared, but
can be reconfigured between LPARs
• Adapter virtualization is transparent to application software
• Both RoCE Express ports are enabled by z/OS• z/OS support is be available in z/OS V2R2 (base) and on z/OS V2R1 via APAR/PTF
CPC
FID
Guest 2Guest 1LP08
FID FID
PR/SM
If 1
If 1 If 1
If 1
PCHID 100Ports 1 & 2
‘NET1 ’
LP12
Ports 1 & 2RoCE RoCE
PCHID 12C
I/O Drawer 1 I/O Drawer 2
FID
LP06
If 1
FIDFID FID
If 2 If 2
LP04
NET2
z/VMz/VMFID
If 1
© 2015 IBM Corporation 61
WWPN Preservation for FCP
© 2015 IBM Corporation 62
HMC SAN Discovery Tools• Tools allow system administrator to query the SAN and remote ports
• They are accessed via Channel Problem Determination panel on SE.
• To use the tools, the system must be IML’ed and partition(s) of interest activated.
• Operating system(s) need not be booted• Tools can be used concurrent with normal traffic being run on pchids.
• Data from tools can be exported as CSV files.
Select PCHID Select partition
New option
Select ok to continue to next panel
z13 Summary
© 2015 IBM Corporation 64
z13 provides the next generation of Mainframe I/O:New resilient IO Infrastructure addresses Skills, Complexity, Cost and Availability
Capability Client Value16 Gbs FICON and 16 Gbs FC
Faster links will improve I/O latency. For DB2 Log Writes, 16 Gbs zHPF will improve DB2 log write by up to 32%, improving DB2 transactional latency. Clients can expect up to 32% reduction in elapsed times for I/O bound batch jobs.
Forward Error Correction Codes
The faster link speed technologies are more sensitive to the quality of the cabling infrastructure. Many system z clients have encountered impacts to the production workload after deploying 8 Gbs technology. IBM is leading new industry standard to provide FEC for optical connections. This will provide the ability to correct up to 11 bit errors out of a block of 2112 bits, the same benefit that would occur as if the optical signal strength was increased 2x yielding substantially reduced IO link errors. This technology will allow System z I/O to operate at higher speeds, over longer distances, with reduced power and higher throughput, while retaining the same reliability and robustness that FICON has traditionally been known for.
zHPF Extended Distance II
Clients using multi-site configurations can expect up to 50% I/O service time improvement when writing data remotely (remote site recovery). This capability is required especially for GDPS HyperSwap configurations where the secondary DASD subsystem is in another site.
FICON Dynamic Routing
New System z host feature that allows clients to use Brocade Exchange Based Routing (EBR) or CISCO Open Exchange ID Routing (OxID) across cascaded FICON Directors. This will simplify configuration planning, capacity planning, provide persistent and repeatable performance and be more resilient after hardware failures by allowing the ISL links to be driven to higher utilizations before encountering queuing delays. Allows sharing of switches between Configuration planning is simplified and hardware costs reduced by allowing FICON and FCP (PPRC) to share the same switch infrastructure without creating separate virtual switches and adding ISLs.
Fabric Priority With SAN Fabric Priority important work gets done first when SAN hardware failures result in traffic congestion. This is achieved by extending the z/OS WLM policy into the SAN fabric leveraging capabilities of the SAN vendors. z/OS and System z will be the first platform to exploit this industry feature.
Scale Scales to six logical channel subsystems (LCSS) allows for up to 85 client useable LPARs. All FICON channels supported on z13 (FE8, FE8s, FE16s) will support up to 32K devices per channel.
Resilience A fourth subchannel set for each LCSS is provided to facilitate elimination of single points of failure for storage after a disk failure by facilitating the exploitation of IBM’s DS8870 Multi-target Metro Mirror storage replication with GDPS and TPC-R HyperSwap.
Read Diagnostic Parameters (GA2)
Integrated instrumentation to allow clients to find potential trouble spots in the SAN without manually inserting light meters around the machine room. This will help reduce false Repair Actions (no defect found, NDF).z/OS will also automatically be able to differentiate when errors are caused by faulty components versus dirty optical connections.
© 2015 IBM Corporation 65
For More Information…
• Enhancing Value to Existing and Future Workloads with IBM z13, REDP-5135http://www.redbooks.ibm.com/abstracts/redp5135.html?Open
• Get More Out of Your IT Infrastructure With IBM z13 I/O Enhancements, REDP-5134http://www.redbooks.ibm.com/abstracts/redp5134.html?Open
• DS8870 and z13 FEC Demohttps://www.youtube.com/watch?v=gOPdJ9ewjRg
• IBM DS8870 and z Systemshttps://www.youtube.com/watch?v=OGuzeSdnEp8&feature=youtu.be
Presentation > TJ Harris (Global Mirror Architect for DS8000)
Thank You!
© 2015 IBM Corporation 67
Resilience Summary
SCSI SCSIECKD ECKD
z13 System Z
(10) EC12 channel path selection algorithms will dynamically adjust and
send more of the work to better paths
(1) I/O Exerciser Tool
(2) Increased bandwidth and lower latency with 16 Gbs FICON for workload
peeks
(3) Lower DB2 log latency and higher throughput with
HyperWrite for DB2 Log Write Acceleration
(4) FEC for automatic correction of up to 11 bit errors per 64 byte blocks
(5) Policy based SAN alerts to warn about ISL
degradations
(6) RDP FC command for rapid fault isolation
(7) Efficient use of ISL bandwidth and higher
utilizations after a failure with minimal service time degradations with FICON
Dynamic Routing
(9) WLM Client policy for fabric priority when contention occurs
(8) FICON multi-hop to automatically re-route
around ISL failures
(11) zHPF extended distance II for improved
write execution at distance(12) Added flexibility for
Managed File Transfer any time of the day (13) DASD SPoF
Elimination with Multi-target PPRC and 4th
Subchannel Set
z13 System Z
Single Stream Performance
© 2015 IBM Corporation 69
4K Read Latency 1 Stream (less is better)
0.000 0.020 0.040 0.060 0.080 0.100 0.120 0.140 0.160 0.180
zEC12 FEx8S zHPF Read 8Gb HBA
z13 FEx8S zHPF Read 8Gb HBA
z13 FEx16S zHPF Read 8Gb HBA
z13 FEx16S zHPF Read 16Gb HBA
PEND CONN
-19%
-10%
-7%
© 2015 IBM Corporation 70
4K Write Latency 1 Stream (less is better)
0.000 0.050 0.100 0.150 0.200 0.250
zEC12 FEx8S zHPF Write 8Gb HBA
z13 FEx8S zHPF Write 8Gb HBA
z13 FEx16S zHPF Write 8Gb HBA
z13 FEx16S zHPF Write 16Gb HBA
PEND CONN
-20%
-3%
-1%
© 2015 IBM Corporation 71
16x4K Read Latency 1 Stream (less is better)
0.000 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400
zEC12 FEx8S zHPF Read 8Gb HBA
z13 FEx8S zHPF Read 8Gb HBA
z13 FEx16S zHPF Read 8Gb HBA
z13 FEx16S zHPF Read 16Gb HBA
PEND CONN
-19%
-1%
-1%
© 2015 IBM Corporation 72
16x4K Write Latency 1 Stream (less is better)
0.000 0.100 0.200 0.300 0.400 0.500
zEC12 FEx8S zHPF Write 8Gb HBA
z13 FEx8S zHPF Write 8Gb HBA
z13 FEx16S zHPF Write 8Gb HBA
z13 FEx16S zHPF Write 16Gb HBA
PEND DISC CONN
-9%
2%
© 2015 IBM Corporation 73
32x4K Read Latency 1 Stream (less is better)
0.000 0.100 0.200 0.300 0.400 0.500 0.600
zEC12 FEx8S zHPF Read 8Gb HBA
z13 FEx8S zHPF Read 8Gb HBA
z13 FEx16S zHPF Read 8Gb HBA
z13 FEx16S zHPF Read 16Gb HBA
PEND CONN
-16%
1%
1%
© 2015 IBM Corporation 74
32x4K Write Latency 1 Stream (less is better)
0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900
zEC12 FEx8S zHPF Write 8Gb HBA
z13 FEx8S zHPF Write 8Gb HBA
z13 FEx16S zHPF Write 8Gb HBA
z13 FEx16S zHPF Write 16Gb HBA
PEND CONN
-23%
-14%
-15%
Multi-Stream Performance
© 2015 IBM Corporation 76
4K Read Latency 8 Streams (less is better)
0.000 0.050 0.100 0.150 0.200 0.250
zEC12 FEx8S zHPF Read 8Gb HBA
z13 FEx8S zHPF Read 8Gb HBA
z13 FEx16S zHPF Read 8Gb HBA
z13 FEx8S zHPF Read 16Gb HBA
z13 FEx16S zHPF Read 16Gb HBA
PEND CONN
-39%
-21%
-17%
-12%
© 2015 IBM Corporation 77
0.000 0.050 0.100 0.150 0.200 0.250 0.300 0.350
zEC12 FEx8S zHPF Write 8Gb HBA
z13 FEx8S zHPF Write 8Gb HBA
z13 FEx16S zHPF Write 8Gb HBA
z13 FEx8S zHPF Write 16Gb HBA
z13 FEx16S zHPF Write 16Gb HBA
Multi-Stream 4K Write
PEND CONN
4K Write Latency 8 Streams (less is better)
-34%
-31%
-2%
-3%
© 2015 IBM Corporation 78
0.000 0.500 1.000 1.500 2.000 2.500 3.000
zEC12 FEx8S zHPF Read 8Gb HBA
z13 FEx8S zHPF Read 8Gb HBA
z13 FEx16S zHPF Read 8Gb HBA
z13 FEx8S zHPF Read 16Gb HBA
z13 FEx16S zHPF Read 16Gb HBA
Multi-Stream 128x4K Read
PEND CONN
128x4K Read Latency 8 Streams (less is better)
-60%
-58%
-44%
-44%
© 2015 IBM Corporation 79
128x4K Write Latency 8 Streams (less is better)
0.000 1.000 2.000 3.000 4.000 5.000 6.000 7.000
zEC12 FEx8S zHPF Write 8Gb HBA
z13 FEx8S zHPF Write 8Gb HBA
z13 FEx16S zHPF Write 8Gb HBA
z13 FEx8S zHPF Write 16Gb HBA
z13 FEx16S zHPF Write 16Gb HBA
PEND DISC CONN
-58%
-57%
-5%
-3%
© 2015 IBM Corporation 80
0.000 1.000 2.000 3.000 4.000 5.000 6.000 7.000
zEC12 FEx8S zHPF Read/Write 8Gb HBA
z13 FEx8S zHPF Read/Write 8Gb HBA
z13 FEx16S zHPF Read/Write 8Gb HBA
z13 FEx8S zHPF Read/Write 16Gb HBA
z13 FEx16S zHPF Read/Write 16Gb HBA
Multi-Stream 128x4K Read/Write
PEND DISC CONN
128x4K Read/Write Latency 8 Streams (less is better)
-29%
-27%
-12%
-19%