Allyn Walsh Power Strategic Initiatives Team [email protected]/hosted_files/discover2016/11/510 -...
Transcript of Allyn Walsh Power Strategic Initiatives Team [email protected]/hosted_files/discover2016/11/510 -...
© 2016 IBM Corporation0
IBM i Availability Update
Allyn Walsh
Power Strategic Initiatives Team
© 2016 IBM Corporation1
Agenda
Availability Enhancements
– Scheduled downtime
– Backups and Maintenance
– Unscheduled downtime
– Recovery
Single points of failure
High Availability Solutions
© 2016 IBM Corporation2
Availability Strategy
Balanced
Systems
Growth
Eliminate System
Outages
Reduce Frequency &
Duration of Outages
HA Clustering
Solutions
© 2016 IBM Corporation3
Availability Enhancements
Scheduled downtime
Backups
Save Performance
Online Backups
Usability/Automation
Maintenance
PTFs
OS Upgrades
Hardware / firmware
Unscheduled downtime
Recovery time
Abnormal IPL
Transactions up to point
of failure
Restoring lost or
corrupted data
Single points of failure
Processor & Memory
Power & Service Proc
DASD
Other I/O
High availability
solutions
Support for Key
Environments
Usability
Currency, Standards,
Compliance
© 2016 IBM Corporation4
Scheduled downtime: Backups
Save Performance
Performance increase with each generation of tape hardware technologies
– Couple with multiple concurrent or parallel backups for maximum throughput
7.2 Faster IFS save times using SAV command with new ASYNCBRING parameter
– Up to 60% faster save times in some cases (depends on directory structure,
number/size of objects, and other factors – Note: performance could degrade in
some situations)
– To enable ASYNCBRING in 6.1 or 7.1, see IBM Technote N1011242:
• http://www-01.ibm.com/support/docview.wss?uid=nas8N1011242
– See BRMS developerWorks wiki for BRMS support information:
• IBM Backup, Recovery and Media Services (BRMS) for i >Backup >Backup of
integrated file system objects >Specifying ASYNCBRING for IFS backups
© 2016 IBM Corporation5
http://www-03.ibm.com/systems/power/software/i/management/performance/resources.html
Updated April 2016
© 2016 IBM Corporation6
BRMS command - PRTRPTBRM *CTLGRPSTAT
• Excellent for:
• Monitoring / Analyzing ongoing Backup Performance
• Sizing New Tape / ProtecTIER Environments
• See BRMS developerWorks Wiki for BRMS support information
• IBM Backup, Recovery and Media Services (BRMS) for i >Backup >Save Items >Backup control
group status – PRTRPTBRM
• https://ibm.biz/BdF6rr
• Additional enhancements – as of 3/2016 BRMS PTFs
• http://ibmurl.hursley.ibm.com/NPMJ
• Multiple values on the From system (FROMSYS) parameter (centralized reporting)
• TSM (USEADSM) parameter
• Use save files (USESAVF) parameter
© 2016 IBM Corporation7
Scheduled downtime: Backups
Online “point in time” backups
6.1 Save-While-Active (SWA) synchronized across libraries and IFS objects
– Single checkpoint to ensure library and IFS data saved in consistent state
– STRSAVSYNC command in 6.1
Full System Copy Services Manager (FSCS) from IBM STG Lab Services
– Enables automated, full system backup without ending user jobs
– Supports SVC/V7000, DS8000 and XIV external storage
– See Full System Copy Services Manager section of IBM i Advanced Copy
Services developerWorks Wiki
• http://ibmurl.hursley.ibm.com/NP77
Save IASP mirror copy (FlashCopy or detached geographic mirror)
– For PowerHA environments
– FlashCopy with ICSM (IASP Copy Services Manager) from IBM STG Lab Services
enables automatically creating “point intime”copiesofanIASPonDS8000or
SVC/V7000/V3700.
– See ICSM - FlashCopy section of IBM i Advanced Copy Services developerWorks
Wiki
© 2016 IBM Corporation8
FlashCopy overview
FlashCopy is a point-in-time-copy of external storage logical volumes that can be established very quickly and with minimal or no disruption or resource on the production LPAR
IASP based or Full System solution
FlashCopy options – full copy, no copy
Space Efficient FlashCopy volumes can reduce FlashCopy storage by 70-80%
Use with 6.1 Quiesce to eliminate IASP vary off or LPAR shut down, journal for object and data integrity of flashcopy
Automate with IASP Copy Services Manager for PowerHA on i or with Full System Copy Services Manager" (FSCS) from IBM STG Lab Services
Integration with BRMS
Ideal for off-line backup solutions
DS8000, SVC, Storwize, V840/V9000
Prod LPAR
*SYSBAS
IASP
LPAR-2
(backup)
*SYSBASTape Backup
IASP
FlashCopy
© 2016 IBM Corporation9
PowerHA FlashCopy options
• IBM i can leverage FlashCopy to:
Create a copy of an IASP for backup
Create a full system copy for backup
• Recommendation is:
• Vary off the IASP or power down the system before taking the Flash
• Known as a “cold Flash” is the best way to guarantee complete data integrity
• IBM i 6.1 added support for the “Quiesce” of IASP
• Known as a “warm Flash”
• suspends transactions & operations to ensure that as much in-flight data as possible is
written to disk
• Places transactions at database boundaries if possible
Best when used with applications running commitment control
• Requires a ‘recovery vary-on’ of the IASP.
• 7.1 and later supports Quiesce with VIOS storage pools
© 2016 IBM Corporation10
www.ibm.com/systems/services/labservices
What is Full System FlashCopy?
• A set of tools provided by IBM PowerHA Lab Services for years
• Originally marketed as Full System FlashCopy Toolkit (FSFC Toolkit)
• Rebranded in 2014 as Full System Copy Services Manager (FSCSM)
• Rebranded in 2016 as a feature under Full System Manager
• Full System FlashCopy (FSFC)
• Leverages FlashCopy features on IBM storage subystems
• Presents a full system “copy”of a production system (LIC and O/S)
to another set of hardware resources
• Intended to be a BACKUP SOLUTION
• Automates the end-to-end processes required to achieve a full
system backup of production system(s)
© 2016 IBM Corporation11
Who are the target customers for Full System FlashCopy?
• Customers unable (or un-willing) to implement PowerHA requirement for
independent auxiliary storage pools (IASPs) for any reason
• Time constraints
• Resource limitations
• Budgetary restraints
• Application-specific technical obstacles
• Customers with limited down-time windows on production systems
• Customers desiring easier recovery options for full system outages
• Customers needing alternative options to software-based strategies
• Stopping replication, allowing changes to catch up, performing backups primarily
on application data only, resuming replication
• Customerswantingatemporaryproduction“test”environmenttouseuntilthe
next backup is performed
Most
common
reason
© 2016 IBM Corporation12
Features and Benefits of Full System FlashCopy
• Provides full system backup of production partition with minimal (or no)
interruption to production workload
• Provides better RPO of full system data (more frequent backups)
• Simplifies recovery process with full system backup on one (or few)
tape(s)
• Supported with wide variety of IBM external storage
• DS8000 Family
• Spectrum/Storwize (SVC-based) - V3700, V5000, V7000, V9000, Flash
Systems
• XIV
• Can manage multiple environments from one “controlling”partition
• Automates end-to-end process via single command
• Easy, centralized management of many steps on many systems
• Extensive logging for time-tracking and error resolution
• Integrates well with BRMS environments (but not required)
© 2016 IBM Corporation13
Scheduled downtime: Backups
Backup Usability/Automation
7.1 and 7.2 BRMS enhancements
– Improved functions for managing backups, media, backup history, and recoveries
• See News section of BRMS developerWorks Wiki
http://ibmurl.hursley.ibm.com/2KIS
7.1 BRMS Enterprise Function
– Monitor backup operations for all your BRMS systems from a central site
– Provides common spot for recovery reports, dashboard for all systems, ability to get status on a control group run... all from a central hub.
– 7.2 enhancements (not PTFed back to IBM i 7.1)
• AutorefreshofHub’snodestatus(dashboard)via newQ1ABRMENTsubsystem
• PRTRPTBRM support
• Failed Control Group View to debug problems faster
– See BRMS Enterprise section of BRMS developerWorks Wiki for details
• https://ibm.biz/BdDYg5
– See BRMS Enterprise Enhancements Redpaper
• http://www.redbooks.ibm.com/redpieces/abstracts/redp4926.html
© 2016 IBM Corporation14
Enterprise
IBM Navigator for i enhancements
Backup enhancements (see next page)
Recovery enhancements
Media services enhancements
BRMS network enhancements
Install enhancements
Maintenance enhancements
Miscellaneous enhancements
http://ibmurl.hursley.ibm.com/NPML
© 2016 IBM Corporation15
Scheduled downtime: Backups
Backup Usability/Automation
7.2 Select (SELECT) parameter on SAVLIB/SAVOBJ commands to refine which objects to include or omit
7.2 Spooled files can now be saved with the Save Changed Objects (SAVCHGOBJ) command
7.2 TCP/IP configuration information automatically saved with QUSRSYS
7.1 Removed limit that prevented saving database files with more than 16 MB of descriptive information
– NOTE: Find all the system limit changes for any OS version in the IBM i Knowledge Center
NOTE:
Save Storage (SAVSTG) command no longer available on 7.2
© 2016 IBM Corporation16
Scheduled downtime: Backups
Backup Usability/Automation
Tape Virtualization (address tape errors and tape handling
issues)
– IBM i virtual tape outstanding large file save performance
– IBM TS7600 ProtecTIER® Deduplication Family great for
remote replication
© 2016 IBM Corporation17
OhioProtecTIE
R
IBM i
Local Saves to
Virtual Tape with
De-dup
IBM i ProtecTIE
R
New York
IP Replication
Minimized bandwidth
since data is de-dup’d
before sending
TS3500
Optional
duplication
to physical
tape
(at local or
remote
site)
Disk
Virtual
Tapes C
A
B
C
A
AB
B
A
What is
DeDuplication?
C
A
B
C
A
AB
B
A C
A
B
C
A B
B
A
A
What does ProtecTIER do?
© 2016 IBM Corporation18
Where does ProtecTIER fit?– Generally and on IBM i
SmallServerscan’t optimizeatapedrive
Offsite Shipments are Costly and a BotherTapes are Hard to Manage
Writing Waiting Waiting Waiting
SmallBackupsdon’tfillatape
Virtual tape can provide
multiple virtual drives Virtual tape can make virtual
volumes of any size
Virtual
tape keeps
all the
volumes
inside the
device
Virtual
tape can
transmit
them to a
remote siteVery Interesting for
IBM i Customers
Nice with VIOS
for IBM i
© 2016 IBM Corporation19
Backups: Best practices / technologies
High Speed tapes – upgrade to latest generation for maximum performance
Multiple tapes – when a single fast tape drive is not enough
– Concurrent saves and/or Parallel saves
Real time backups of active data – when users won’t give it up
– Save-While-Active (SWA)
– Saving journal receivers – complex and slow recovery, but cheap
Off-line backups from a point-in-time copy – move it off to tape at your leisure
– Snapshots (for example, FlashCopy with DS8000, SVC, V7000, V3700)
– Logical replication read-only access of target copy
Automation – because humans cannot keep up with all of the options
– BRMS – Backup Recovery and Media Services – does it all, including FlashCopy
– IBM STG Lab Services tools (IASP or Full System Copy Services Manager)
Tape Virtualization – address tape errors and tape handling issues
– IBM i virtual tape
– IBM TS7600 ProtecTIER® Deduplication Family
© 2016 IBM Corporation20
Availability Enhancements
Scheduled downtime
Backups
Save performance
Online backups
Usability/Automation
Maintenance
PTFs
OS Upgrades
Hardware / firmware
Unscheduled downtime
Recovery time
Abnormal IPL
Transactions up to point
of failure
Restoring lost or
corrupted data
Single points of failure
Processor & Memory
Power & Service Proc
DASD
Other I/O
High availability
solutions
Support for key
environments
Usability
Currency, Standards,
Compliance
© 2016 IBM Corporation21
Scheduled downtime: PTFs 7.2 More immediate apply PTF opportunities
• Conditional Immediate PTFs
• Allows an immediate apply PTF which supersedes a delayed PTF to be Immediate applied if the superseded PTF has already been applied
• Prior to 7.2, Immediate PTFs cannot supersede delayed PTFs
– Once delayed PTF created, snowball effect triggered since all future superseding PTFs must be delayed even if the changes in the PTF could be applied immediately
7.1/7.2 Display PTF Apply Information (DSPPTFAPYI) command
(7.1 requires PTF SI52034 included in TR8)
• Shows whether selected PTFs can be applied immediately
• PTF save files and PTF groups must exist in or be copied into *SERVICE before running command
6.1/7.1/7.2 PTF apply time improvements
• Improvements to Long Running PTF apply exit programs
• Improvements to LIC PTF apply (benefits smaller partitions/systems)
• Automatic double IPL for PTFs requiring extra IPL for installation
6.1 Networked virtual optical for OS upgrades, PTF install, or LP install
© 2016 IBM Corporation22
Scheduled downtime: OS upgrades
Technology Refreshes (7.1 and beyond)
– Semi-annual technology refreshes provide new functions and I/O support
– Simpler to install on a current release and less disruptive (PTF apply vs. OS upgrade)
– Allows many years between disruptive major OS upgrades
Independent ASPs / PowerHA
– Upgrade target LPAR to new release without disruption to production
– Vary off IASP from old release, vary on IASP to new release (minimum disruption/outage)
FlashCopy
– Eliminate downtime for pre/post-upgrade backups
– Create cloned image to test upgrade process, calculate timings (repeatable over and over)
– Create cloned image for rapid back-out if upgrade process fails / takes too long
Central site distribution media (“DLO media”)
– Saves time/steps in the upgrade process
– See“Distributing software using central site distribution”topicintheKnowledgeCenter
http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_71/rzai4/rzai4centsitedist.htm
Image catalog and network install using virtual optical storage & NFS
– Eliminates need to handle physical media during install process
– See“Preparing to upgrade or replace software with virtual optical storage using the Network File System”topicintheKnowledgeCenter
http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_71/rzahc/rzahcpreparingtoupgradevirtoptnfs.htm
Install time improvements with faster POWER processors
© 2016 IBM Corporation23
Scheduled downtime: Hardware maintenance
Concurrent repair of fans/power supplies
Concurrent repair of PCIe adapters
Concurrent repair of disks configured for redundancy
7.1 Concurrent disk move/remove
POWER6/7 hot-add HSL-2 and 12X I/O loop adapters
POWER8 hot-pluggable optical modules for I/O drawer attachment
Concurrent system firmware updates (between releases)
7.1 TR4 Live Partition Mobility
– Migrate running workloads between systems to enable continuous availability during
planned server hardware maintenance / outages
– Combine with Power Enterprise Pools with Mobile and Elastic COD for maximum
flexibility and economic efficiency
© 2016 IBM Corporation24
7.1 Concurrent disk move/remove
Concurrent Remove of Disk Units
– Logically remove disk unit(s) without having to IPL or take an outage (physical remove requires IPL to clean-up hardware resources)
– Does not require restricted state
– Can be paused and restarted
POWER5 and up, with 7.1
Work with Removing Units From Configuration
Select one of the following:
1. Display disk configuration 2. Display status of remove operation 3. Remove units from configuration 4. Pause the remove operation 5. Resume the remove operation 6. Cancel the remove operation 7. Cancel the remove operation and balance data in the ASP
2
Serial Resource Number Type Model Name Capacity Status YL4RUT3ERVR7 6B22 050 DD006 37287 Non-configured
4
Size %Used Status 195754 33.56% Unprotected 55924 40.77% Configured 18643 40.78% Configured 23308 40.78% Configured 27965 40.77% Configured 32623 40.77% Configured 37287 2.88% Removing
3
See section 8 in: IBM i 7.1 Technical Overview, SG24-7858
http://www.redbooks.ibm.com/redpieces/abstracts/sg247858.html?Open
Work with Disk Configuration
Select one of the following:
1. Display disk configuration 2. Add units to ASPs 3. Work with ASP threshold - . . .- . . . 10. Stop hot spare 11. Work with encryption 12. Work with removing units from configuration
1
© 2016 IBM Corporation25
Live Partition Mobility requires the purchase of the optional PowerVM Enterprise Edition
Rebalance processing
power across servers when
and where you need it
Reduce planned downtime by
moving workloads to another server
during system maintenance
Movement to a
different server
with no loss of
service
Virtualized SAN and Network InfrastructureVirtualized SAN and Network Infrastructure
Move a running partition from one Power7/8
server to another with no application downtime
Live partition mobility
© 2016 IBM Corporation26
Partition mobility: Active and Inactive LPARs
Active Partition Mobility Active Partition Migration is the actual movement of a running LPAR from one
physical machine to another without disrupting the operation of the OS and
applications running in that LPAR.
Applicability
Workload consolidation (e.g. many to one)
Workload balancing (e.g. move to larger system)
Planned CEC outages for maintenance/upgrades
Impending CEC outages (e.g. hardware warning received)
Ability to move from Power7 servers to Power8 servers without an
outage
Inactive Partition Mobility InactivePartition Migration transfersapartition that islogically ‘poweredoff’(not
running) from one system to another.
Suspended Partition Mobility Suspended Partition Migration transfers a partition that is suspended from one
system to another.
© 2016 IBM Corporation27
Availability Enhancements
Scheduled downtime
Backups
Save Performance
Online Backups
Usability/Automation
Maintenance
PTFs
OS Upgrades
Hardware / firmware
Unscheduled downtime
Recovery Time
Abnormal IPL
Transactions up to point
of failure
Restoring lost or
corrupted data
Single Points of Failure
Processor & Memory
Power & Service Proc
DASD
Other I/O
High availability
solutions
Support for key
environments
Usability
Currency, Standards,
Compliance
© 2016 IBM Corporation28
Unscheduled Downtime: Recovery Time
IPL performance improvements
7.1/7.2 Access Path recovery and Journal synchronization– Significant improvement in some environments (including IASP vary on), particularly where large
Access Paths are rebuilt unexpectedly instead of recovered from the journal• 7.1 Cumulative PTF package C4283710 and/or DB2 PTF Group SF99701 Level 32
• 7.2 Cumulative PTF package C4276720 and/or DB2 PTF Group SF99702 Level 3
7.2 Main Store Dump (MSD) improvements– Storage Management Subset Directory Recovery (SRC C6004250) on systems with very large
(>4GB) permanent directories
– Smart Dump for User-Initiated MSDs and XPF-related crashes
• Prior to 7.2, these were always Full Dumps requiring all main store to be dumped• Smart dumps require only a subset (around 10%) of main storage to be copied to disk which
greatly reduces the time needed for this MSD step
– New message on Copy Status screen indicates when Smart (subset) dump vs. Full dump being copied (factor into IPL recovery time/decision making)
“Best Practices for Managing Time Needed for Main Storage Dump (MSD)” Technote
– Latest recommendations, enhancements, and PTFs to help manage and reduce time required for a MSD
http://www-01.ibm.com/support/docview.wss?uid=nas8N1020270
© 2016 IBM Corporation29
Main Store Dump Smart-Dump Indicator
Prior to 7.1, when copying a Main Store Dump to disk, SRC C6xx4404 is displayed where xx is the percent completed for copying the dump
In 7.1 (and 6.1.1 with PTF MF58168), the SRC is later changed to indicate what type of dump is being copied (xx still displays the percent complete):
Full dump:
SRC C6xx1404: Copying a compressed full dumpSRC C6xx2404: Copying a uncompressed full dump
Subset dump ("smart dump"): Much shorter/faster than a full dump !SRC C6xx3404: Copying a uncompressed subset dumpSRC C6xx4404: Copying a compressed subset dump
In 7.2, a message is displayed on the MSD copy status screen while the copy is in progress indicating the type of dump being copied and the percent complete
– To see this information, press Enter on the MSD summary screen ("Main Storage Dump Occurred" screen), select "Work with current main storage dump (MSD)" on the MSD Manager screen, and then press F11=Copy status
© 2016 IBM Corporation30
Unscheduled Downtime: Recovery Time
7.1 TR4 Remote Restart
– Recover and reboot partition on another server after an unplanned
server failure
– Similar prerequisites as for LPM (VIOS, external storage, etc.)
• Combine with Power Enterprise Pools for maximum flexibility/efficiency
– Requires either
• IBM Systems Director VMControl
– or –
• HMC V8R8.1.0 – (Simplified Remote Restart with HMC 8.820)
– Includes new command to initiate a Remote Restart operation
(manually or automate using CLI/APIs)
© 2016 IBM Corporation31
Unscheduled downtime: Recovery time
Restore Enhancements
7.2 Defer restore and journaling of dependent objects
– Also available as 7.1 PTF SI50939
http://www-912.ibm.com/systems/electronic/support/a_dir/as4ptf.nsf/ALLPTFS/SI50939
7.2 Option to not start journaling for restored objects
– New STRJRN parameter on restore command
7.1 New ALWOBJDIF(*COMPATIBLE) value on RSTLIB/RSTOBJ commands
– Essentially behaves like ALWOBJDIF(*ALL) in combination with
ALWOBJDIF(*FILE) for database files (i.e., a single value that does what most
people want to happen)
7.1 Fast Restore of Single Object
6.1 Restore logical and physical files in different libraries in any order
© 2016 IBM Corporation32
7.1 Fast restore of single object
Save operations track position (physical location on tape) of each object
– New field returned for each object in Save OUTFILE or OUTPUT
– You must retain this positioning information for use during restore
New restore POSITION parameter
– RSTLIB RSTOBJ and RST commands and APIs
– POSITION(*START) is default and gives existing behavior
– POSITION (hexadecimal value) allows you to pass the position for a single object to
be restored
– Requires that sequence number SEQNBR also be specified
Also supported for parallel restores
Very significant performance improvements can result
– For example: Restoring last object from 1.1 million IFS object save went from about
22 minutes to less than 3 minutes.
Backup Recovery Media Service (BRMS) uses this new support
© 2016 IBM Corporation33
Unscheduled downtime: Recovery time
Journal management enhancements (since 6.1)
Journal libraries and automatically journal new objects in the library
– 7.1 Object name filtering for automatic journaling (select/omit objects based on name)
User can end journaling and then start journaling a file without closing file
STRJRNxx/ENDJRNxx commands to start or end journaling all objects in a library
DSPRCYAP/EDTRCYAP screens show which access paths are eligible for SMAPP protection but are not currently being protected
User control over frequency of forcing changed objects to disk (journal recovery count)
– Choose faster runtime processing vs. faster IPL/vary on recovery after abnormal shutdown
See “Journal management > What's new” topic in the Knowledge Center
– http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_72/rzaki/rzakiwhatsnew.htm
– http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_71/rzaki/rzakiwhatsnew.htm
– http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_61/rzaki/rzakiwhatsnew.htm
© 2016 IBM Corporation34
7.2 Remote journal over secure sockets (SSL) support
7.1 Automatic restart of remote journaling if ended due to a recoverable comm error
7.1 Ability from source side to view number of retransmissions occurring for a remote journal connection
– Measurements of how far behind source system is at sending data to target (6.1
included measurements of how far behind target system is at receiving data from
source)
7.1 Filtering and not sending journal entries that are not absolutely needed to the target
7.1 Improved WAN performance via larger buffer size (greater of 256KB or
TCP send buffer size on source or TCP receive buffer size on target)
– Also available via PTFs v5r4m0 MF46358, v5r4m5 MF46359, v6r1m0 MF46360
6.1 Use of data port services with up to 4 comm lines for greater resiliency
6.1 Validity checking to verify data received by target matches data sent from source
Remote journal enhancements
© 2016 IBM Corporation35
Availability Enhancements
Scheduled downtime
Backups
Save Performance
Online Backups
Usability/Automation
Maintenance
PTFs
OS Upgrades
Hardware / firmware
Unscheduled downtime
Recovery Time
Abnormal IPL
Transactions up to point
of failure
Restoring lost or
corrupted data
Single Points of Failure
Processor & Memory
Power & Service Proc
DASD
Other I/O
High availability
solutions
Support for key
environments
Usability
Currency, Standards,
Compliance
© 2016 IBM Corporation36
System Hardware RAS Strategy
First Failure Data Capture
AvailabilityReliability Serviceability
RecoverQuality
of PartsHot Repair
Contain
Errors
Thermal
Control
LPAR
Mobility
SpareQuantity
of Parts
Light Path
Diagnostics
De-allocateDesign and
TestingColor Coding
© 2016 IBM Corporation37
Unscheduled downtime: Single points of failure
Processor & Memory
Instruction Retry (POWER6)
Alternate processor recovery (POWER6)
Active Memory Mirroring for Hypervisor (POWER7 795/780/770)
Dynamic Predictive DIMM deallocation and substitution with
spare/COD memory (POWER8)
See whitepaper "POWER7 System RAS: Key Aspects of Power Systems
Reliability, Availability, and Serviceability“
– http://www.ibm.com/systems/power/hardware/whitepapers/ras7.html
Find POWER8 Technical Overviews at IBM Power Systems Redbooks portal
– http://www.redbooks.ibm.com/portals/power
© 2016 IBM Corporation38
Unscheduled downtime: Single points of failure
Storage and I/O (since 6.1)
Multipath disk units
– 7.1 Display the level of protection for multipath disk units
Redundant VIOS partitions
Hot spare for RAID
Hot spare for Mirroring
Dual SAS RAID adapter support
– 7.1 Dual SAS RAID adapter with cache and no batteries(use supercapacitor technology instead of batteries)
7.1 Ethernet Link Aggregation (EtherChannel)
© 2016 IBM Corporation39
Availability Enhancements
Scheduled downtime
Backups
Save Performance
Online Backups
Usability/Automation
Maintenance
PTFs
OS Upgrades
Hardware / firmware
Unscheduled downtime
Recovery Time
Abnormal IPL
Transactions up to point
of failure
Restoring lost or
corrupted data
Single Points of Failure
Processor & Memory
Power & Service Proc
DASD
Other I/O
High availability
solutions
Support for key
environments
Usability
Currency, Standards,
Compliance
© 2016 IBM Corporation40
IBM PowerHA SystemMirror for i
Hardware based replication solutions (disk level)
Supports both:
– IBM i replication – any storage
– External storage replication – DS8000, SVC, Storwize models including
V9000 Flash
Integrated – Can manage IBM i and external storage HA from one IBM i
GUI or command line
Reliable – Using IBM replication technologies
Efficient – Deeply integrated with lower levels of the OS
Automated – Minimal IT management required
Versatile – Solutions for any storage, any distance
© 2016 IBM Corporation41
PowerHA IASP replication technologies
1 site
Shared Storage
External
Storage
PROD HA
Network
External
Storage
Metro Mirror
External
StoragePROD HA
LUN level switching
External
Storage
PROD HA/DR
Network
External
Storage
Global Mirror
External Storage Replication
External
Storage
PROD DR
Network
External
Storage
LUN switch + Global Mirror
HA
PROD DR
Network
External
Storage
Metro Global Mirror
HA
PROD HA
Network
Synchronous Geographic Mirroring
PROD HA/DR
Network
Asynchronous Geographic Mirroring
1 site
Replication
2 site
Replication
2 site
HA + DR
IBM i Replication
3 site
Replication
(DS8K only)
© 2016 IBM Corporation42
A PowerHA technology for every storage type
Internal
SAS/SSD DS8000
SVC
Storwize XIV DS5000
Other
Storage
Geographic
Mirroring
Metro Mirror
Global Mirror
Metro Global
Mirror
LUN switching
FlashCopy
HyperSwap
© 2016 IBM Corporation43
Enhancements since 7.1 GA
Complete list, including release date and PTF numbers, is available at www.ibm.com/developerworks/ibmi/techupdates/ha
Examples:
SVC Split Cluster with PowerHA LUN level switching
Metro Mirror, Global Mirror, FlashCopy and LUN level switching for SVC and Storwize storage servers
PowerHA GUI support
Global Mirror target FlashCopy
Reverse FlashCopy support for remote mirror copy and no-copy relationships
WRKCADMRE command
CFGGEOMIR command
CFGDEVASP command
PowerHA support for live partition mobility…
© 2016 IBM Corporation44
PowerHA 7.2 enhancements
New Express Edition with HyperSwap
‘Instantly’switchbetweenDS8000servers
Auto-switch for DS8000 failure
Manual switch for planned maintenance
LPM and Hyperswap coordination
SYSBAS Replication Enhancements
Replicate object authority and ownership
70% increase for Admin Domain limit
Reduced Downtime
Better monitoring of vary-on time
Reduced UID/GID synch time during vary-on
Management Improvements
One partition can save multiple production environments
© 2016 IBM Corporation45
PowerHA Express Edition - Full system HyperSwap
First release (7.2) provides support for DS8000 HyperSwap in full system
replication environments
HyperSwap by itself is a hardware availability solution
– ‘Zero’downtimeswitchforstorageplannedandunplannedoutages
– Single partition solution, although can be combined with partition
mobility
– Not a disaster recovery solution
– No protection against software planned or unplanned outages
Once configured, HyperSwap switch will occur automatically in the case of
a DS8K failure, or can be triggered manually before a planned outage
Prod
MetroMirrorSYSBAS SYSBAS
Prod
MetroMirrorSYSBAS SYSBAS
LPM
© 2016 IBM Corporation46
HyperSwap IBM i 7.2 (& 7.3) GA - April 15, 2016 PowerHA Enterprise Edition
© 2016 IBM Corporation47
PowerHA Tools for IBM i
Complement and extend PowerHA and IBM Storage capabilities for HA/DR
Helps reduce business risk and improve resiliency for critical applications
Simplifies set up and automation of HA/DR and backup solutions
Reduces cost of maintaining and regular testing of an HA/DR environment
Facilitates flexible deployment options for single or multi-site protection
Assures consistent deployment using best practices and experienced consultants
PowerHA Tools for IBM i is a service offering from IBM Systems Lab Services
© 2016 IBM Corporation48
PowerHA
Tools for IBM iCapability Benefit
DS
8000
Sto
rwize
Inte
rna
l
Smart Assist for
PowerHA on IBM i
Provides operator commands and scripts to supplement
Pow erHA installation and ongoing operations for IASP
enabled applications.
Simplif ies deployment and ongoing management of
high availability for critical IBM i applications.
Yes Yes Yes
IASP Copy Services Manager(Automated recovery with faster IASP-level vary on, no system IPL)
Flashcopy Automates Flashcopy of IASP for daily off -line backup
w ith seamless BRMS integration.
Increases application availability by reducing or
eliminating backup w indow for routine daily backups.
Yes Yes
LUN-level Switching Simplif ies deployment and automates sw itching of an
IASP betw een IBM i cluster nodes in one data center.
Enables a business continuity manager to provide a
simple, single site HA solution.
Yes*
Metro Mirror
or
Global Mirror
Simplif ies initial deployment and automates ongoing
server and storage management of tw o-site Metro Mirror
or Global Mirror HA or DR solutions. Requires IASP
enabled applications..
Enables a business continuity manager to provide
seamless operation of integrated server and storage
operations for tw o-site high availability or disaster
recovery.
Yes
Metro Global Mirror(MGM)
Extends Pow erHA functionality to provide three-site
server/storage replication solution combining Metro
Mirror for HA w ith Global Mirror for DR. Requires IASP
enabled applications and IBM Tivoli Productivity Center
– Replication (TPC-R).
Enables a business continuity manager to further low er
business risk and maximize business resilience for
highly critical business applications that require three-
site HA/DR protection.
Yes
Full System Copy Services Manager(Automated recovery, requires full system IPL on target LPAR)
XIV
Flashcopy Automates full system Flashcopy for daily off -line
backup w ith integrated support for BRMS w ithout IASP-
enabled applications.
Increases application availability by reducing or
eliminating backup w indow for routine daily backups.
Enables an entry solution w hile planning IASP
enablement.
Yes Yes Yes
Metro Mirror
or
Global Mirror
Simplif ies initial deployment and automates ongoing
server and storage management of tw o-site Metro Mirror
or Global Mirror HA or DR solutions. w ithout IASP-
enabled applications.
Enables a business continuity manager to provide
seamless operation of integrated server and storage
operations for HA or DR. Enables an entry solution
w hile planning IASP enablement.
Yes
PowerHA Tools for IBM i
*DS8000 support available w ith Pow erHA Tools for IBM i 6.1 or earlier, included in Pow erHA SystemMirror 7.1
© 2016 IBM Corporation49
PowerHA Service Offering Description
IBM i High Availability Architecture
and Design Workshop
An experienced IBM i consultant will conduct a planning and design
workshop to review solutions and alternatives to meet HA/DR and
backup/recovery requirements. The consultant will provide an architecture
and implementation plan to meet these requirements.
PowerHA for IBM i Bandwidth Analysis An experienced IBM i consultant will review network bandwidth requirements
for implementing storage data replication. IBM will review I/O data patterns
and provide a bandwidth estimate to build into the business and project plan
for clients deploying PowerHA for IBM i.
IBM i Independent Auxiliary Storage Pool
(IASP) Workshop
An experienced IBM i consultant will provide jumpstart services for migrating
applications into an IASP. Training includes enabling applications for IASPs,
clustering techniques, plus managing PowerHA and HA/DR solution options
with IASPs.
PowerHA for IBM i Implementation Services An experienced IBM consultant will provide services to implement an HA/DR
solution for IBM Power Systems servers with IBM Storage. Depending on
specific business requirements, the end-to-end solution implementation may
include a combination of PowerHA for IBM i and/or PowerHA Tools for IBM i,
plus appropriate storage software such as Metro Mirror, Global Mirror and/or
Flashcopy.
IBM Lab Services Offerings for PowerHA for i
For more information on PowerHA Tools for IBM i offerings and services,
contact: Mark Even [email protected] 507-253-1313
www.ibm.com/systems/services/labservices [email protected]
© 2016 IBM Corporation50
Additional resources
IBM i 7.2 wiki
– https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/IBM%20i%20
Technology%20Updates/page/IBM%20i%207.2%20-%20Base%20Enhancements
“What'snew”topicintheKnowledgeCenter
– http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_72/rzahg/rzahgicoverview.htm
– http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_71/rzahg/rzahgicoverview.htm
– http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_61/rzahg/rzahgicoverview.htm
IBM i 7.2 Availability
– https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_72/rzahg/rzahgavailability.htm
© 2016 IBM Corporation51
PowerHA resources
IBM PowerHA web site
– http://www-03.ibm.com/systems/power/software/availability
IBM PowerHA SystemMirror for i wiki
– http://www.ibm.com/developerworks/ibmi/ha/
IBM STG Lab Services
– http://www-03.ibm.com/systems/services/labservices
Redbooks at http://www.redbooks.ibm.com/portals/power– Implementing PowerHA for IBM i - SG24-7405-00
– IBM i 6.1 Independent ASPs - SG24-7811-00
– PowerHA SystemMirror for IBM i Cookbook – SG24-7994-00
– IBM i and IBM Storwize Family: A Practical Guide to Usage Scenarios –
SG24-8197-00