SUSE2012 Template v3: 6/20/12 - Novell · *Source: Forrester Research, Inc., The State Of Business...
Transcript of SUSE2012 Template v3: 6/20/12 - Novell · *Source: Forrester Research, Inc., The State Of Business...
2
DOWNTIME IS INEVITABLE.BE PREPARED.
3
Introductory video
ChalkTalk: Toward zero downtime (~4min)
4
Forrester: more and more systems are considered critical
5
Critical workloads… are you prepared for downtime?
*Source: Forrester Research, Inc., The State Of Business Technology Resiliency, Q2 2014.
Business Critical Workloads‒ SAP applications, databases, transactional workloads, and
more
‒ Workloads that may impact a large amount of users
High Density Virtualization ‒ Popular technology that improves server utilization
‒ Downtime of the guests and hosts impact your IT services
6
Planned downtime
What is/causes planned downtime?‒ Scheduled software patching and updates that require system
reboot
‒ Scheduled hardware maintenance
‒ Data migration
What can IT do to mitigate it?‒ Schedule service window to minimize business impact (getting
harder in globalization and mobile era)
‒ Optimize the process
Scheduled Downtime
7
Unplanned downtime
What is/causes unplanned downtime:‒ It’s a surprise – no/little warning
‒ Hardware failure, software bug, malicious attack or operational mistake
‒ An environmental failure such as natural disaster
What can IT do to mitigate it?‒ Reliable systems, HA/Geo clustering, Proactive patch
management
‒ Improve process by best practices and training
8
Unplanned downtime: top causes
Source: Forrester Research, Inc.
How can SUSE help?
10
How to choose the right technologies?
11
Four steps to go toward zero downtime:
1 Prevent hardware downtime‒ Reliability, Availability and Serviceability features
2 Maximize service availability‒ Clustering & Geo Clustering Technology
‒ Live kernel patching
3 Avoid human mistakes‒ Automated patching & security compliance
4 Quickly recover working system state‒ System snapshot & rollback
12
System
Rollback
Live
Kernel
Patching
High
Availability
RAS
13
Reliability, Availability, Serviceability
Interaction of hardware and operating system→ Traditional UNIX capability
SUSE is leading for RAS capabilities on Linux from x86-64 to IBM System z:
‒ CPU error handling (MCE) on Intel x86-64
‒ Hot-Add memory on recent Intel Xeon Architecture
‒ Memory error handling on Intel x86-64
‒ Integrated open source RAID and multipath tools
‒ “Best guest” integration strategy benefits also on VMs
‒ Btrfs filesystem: on-line grow/shrink, re-balance, etc...
14
RAS SystemRollback
Live Kernel Patching
High Availability
15
• Service failover at any distance – from local to geo
• Up to 99.9999% availability
• Rolling updates for less planned downtime
• Easy setup, administration, management
• Virtualization agnostic
• Leading open source High Availability
Fighting Murphy's Law
20+ years experience and leadership
SUSE High Availability
16
SUSE High Availability Features
• Service Availability 24/7
• Data Replication
• Node Recovery
• Cluster File System
• Unlimited Geo Clustering
• Virtualization-Ready
• Network Load-Balancer
• Free Resource Agents
• Clustered Samba
• Broad Platform Support
17
SUSE High Availability
Cluster Example
Kernel
XenVM1
LAMPApache
IPext3
Kernel Kernel
Corosync
Pacemaker
DLM
cLVM2+OCFS2
XenVM2
Network Links Clients
Storage
18
Video: High Availability Demo
Simple HA web service demo (~3min)
19
Cluster Test DriveSimulate and validate cluster setup before actual failover
21
• Cluster fail-over between different ‒ Provide disaster resilience in case of site failure
‒ Each site is a self-contained, autonomous cluster
‒ Support manual and automatic switch-/fail-over
• Extends Metro Cluster capabilities‒ No distance limit between data centers
‒ No unified storage / network needed
• Storage replicated as active / passive‒ Leverage SUSE included data replication (DRBD)
‒ Integrate third-party solutions via scripts
SUSE High Availability
Unlimited Geo Clustering
22
Create rescuemedia
SUSE High Availability
Node Recovery Framework
PXEUSB
CD/DVD
Automated Recovery:● Partitioning● SW RAID / LVM● Formatting● Restore Data● Install Boot loader
FullBackup
Restore Boot rescuesystem
1
23
Existing backup facility
4
24
RAS
High Availability
Live Kernel
Patching
System Rollback
25
It is people
26
Goal: Go back to well-known system state
Reduce operational downtime, human mistakes
‒Patch installation, system upgrades
‒System admin tasks
‒Higher data integrity and availability
New on SUSE Linux Enterprise 12:
• Extended system integration (zypper, YaST)
• Support for Service Pack rollback
• Support for Kernel Upgrade → Full system rollback
System Rollback
27
System Rollback
Components
Grub2: boot loader integration for full system rollback
Snapper: GUI and CLI tool for easy snapshot/rollback
Btrfs: default filesystem with fault tolerance, repair, and easy management features
29
Avoid human mistakes YaST and autoYaST
The most efficient single-system management framework, with consistent UI
30
– Open source one-to-many system management
– Reduce errors by proactive and automated patching
– Complete life cycle management, compliance and security framework
SUSE ManagerSupport the full lifecycle
31
Customer Center
Managed Systems
Managed Systems
Management
Monitoring
ProvisioningAPI
Layer
IT Application
Custom Content
Web UI
Firewall
SUSE Manager
How Does SUSE Manager Work?
SUSE ManagerProxy
MS SCOM
Android App
CLI
SUSE ManagerServer
32
SUSE Manager
Package and Patch Management
33
SUSE Manager
Security and compliance
34
SUSE Manager
A recent use case: “ShellShock bug”
CVE Search
Quick view on impacts
Quick way to react
Easy reporting for compliance
35
RAS
High Availability
SystemRollback
Live KernelPatching
36
SUSE® Linux Enterprise Live Patching
Technology “kGraft”
• Live kernel Patching
• Designed and developed by SUSE Labs
• Ease of use: Builds on well known update processes
Currently being integrated “Upstream” (Kernel community)
• Technical differentiatiors
‒ Works with zero execution interruption
‒ As opposed to other approaches who stop the whole system (miliseconds to seconds range) when patching
37
SUSE® Linux Enterprise Live PatchingUse Cases
• Mission Critical systems‒ Improve general availability
‒ Fix security vulnerability
‒ Long running tasks (simulations, ….)
‒ Long restoring services (SAP HANA, Large Dbs ...)
‒ Run until the next “maintenance window”
‒ Help to meet SLAs
• Help with deployment challenges‒ No need to update all 10000+ systems at one shot, but be able to
run until a specific state is reached
• See also:https://www.suse.com/communities/conversations/need-kgraft-2/
To recap...
39
Reduce Planned Downtimewith SUSE
41
SummaryFour steps to go towards zero downtime:
1 Prevent hardware downtime‒ Reliability, Availability and Serviceability features
2 Maximize service availability‒ Clustering & Geo Clustering Technology
‒ Live kernel patching
3 Avoid human mistakes‒ Automated patching & security compliance
4 Quickly recover working system state‒ System snapshot & rollback
Thank you.
42
Questions?
Corporate HeadquartersMaxfeldstrasse 590409 NurembergGermany
+49 911 740 53 0 (Worldwide)www.suse.com
Join us on:www.opensuse.org
43
Backup Slides
45
LeadershipSUSE® Linux Enterprise High Availability Extension
• Long history track record
• Up-to-date Open Source High Availability stack
• Geo cluster support
• Superior Cluster File System
• Integrated Data Replication
• Full System z support
• Deep OS integration
• Ready for Virtualization
46
•Bootstrapping a cluster is really easy:
‒node1 # sleha-init -i bond0 -t ocfs2 -p /dev/sdb
‒nodeN # sleha-join -c 192.168.2.1
•Connect to the hawk web console for cluster management
Easy Setup – BootstrapSUSE® Linux Enterprise High
47
• Remote monitoring of resources
‒no HA components needed
‒ re-use of Nagios plugins
• Improved handling of virtual guests
‒monitor virtual services from the hypervisor
‒ improve protection of VMs as cluster workload
‒guests remain unaltered – monitoring is external
• Extends pacemaker to include the concept of “container” resources
Blackbox monitoring
Unpublished Work of SUSE. All Rights Reserved.This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE. Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.
General DisclaimerThis document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.