SUSE2012 Template v3: 6/20/12 - Novell · *Source: Forrester Research, Inc., The State Of Business...

Gábor NyersGábor Nyers

Sales Engineer @SUSESales Engineer @SUSE

[email protected]@suse.com

2

DOWNTIME IS INEVITABLE.BE PREPARED.

3

Introductory video

ChalkTalk: Toward zero downtime (~4min)

4

Forrester: more and more systems are considered critical

5

Critical workloads… are you prepared for downtime?

*Source: Forrester Research, Inc., The State Of Business Technology Resiliency, Q2 2014.

Business Critical Workloads‒ SAP applications, databases, transactional workloads, and

more

‒ Workloads that may impact a large amount of users

High Density Virtualization ‒ Popular technology that improves server utilization

‒ Downtime of the guests and hosts impact your IT services

6

Planned downtime

What is/causes planned downtime?‒ Scheduled software patching and updates that require system

reboot

‒ Scheduled hardware maintenance

‒ Data migration

What can IT do to mitigate it?‒ Schedule service window to minimize business impact (getting

harder in globalization and mobile era)

‒ Optimize the process

Scheduled Downtime

7

Unplanned downtime

What is/causes unplanned downtime:‒ It’s a surprise – no/little warning

‒ Hardware failure, software bug, malicious attack or operational mistake

‒ An environmental failure such as natural disaster

What can IT do to mitigate it?‒ Reliable systems, HA/Geo clustering, Proactive patch

management

‒ Improve process by best practices and training

8

Unplanned downtime: top causes

Source: Forrester Research, Inc.

How can SUSE help?

10

How to choose the right technologies?

11

Four steps to go toward zero downtime:

1 Prevent hardware downtime‒ Reliability, Availability and Serviceability features

2 Maximize service availability‒ Clustering & Geo Clustering Technology

‒ Live kernel patching

3 Avoid human mistakes‒ Automated patching & security compliance

4 Quickly recover working system state‒ System snapshot & rollback

12

System

Rollback

Live

Kernel

Patching

High

Availability

RAS

13

Reliability, Availability, Serviceability

Interaction of hardware and operating system→ Traditional UNIX capability

SUSE is leading for RAS capabilities on Linux from x86-64 to IBM System z:

‒ CPU error handling (MCE) on Intel x86-64

‒ Hot-Add memory on recent Intel Xeon Architecture

‒ Memory error handling on Intel x86-64

‒ Integrated open source RAID and multipath tools

‒ “Best guest” integration strategy benefits also on VMs

‒ Btrfs filesystem: on-line grow/shrink, re-balance, etc...

14

RAS SystemRollback

Live Kernel Patching

High Availability

15

• Service failover at any distance – from local to geo

• Up to 99.9999% availability

• Rolling updates for less planned downtime

• Easy setup, administration, management

• Virtualization agnostic

• Leading open source High Availability

Fighting Murphy's Law

20+ years experience and leadership

SUSE High Availability

16

SUSE High Availability Features

• Service Availability 24/7

• Data Replication

• Node Recovery

• Cluster File System

• Unlimited Geo Clustering

• Virtualization-Ready

• Network Load-Balancer

• Free Resource Agents

• Clustered Samba

• Broad Platform Support

17


Cluster Example

Kernel

XenVM1

LAMPApache

IPext3

Kernel Kernel

Corosync

Pacemaker

DLM

cLVM2+OCFS2

XenVM2

Network Links Clients

Storage

18

Video: High Availability Demo

Simple HA web service demo (~3min)

19

Cluster Test DriveSimulate and validate cluster setup before actual failover

21

• Cluster fail-over between different ‒ Provide disaster resilience in case of site failure

‒ Each site is a self-contained, autonomous cluster

‒ Support manual and automatic switch-/fail-over

• Extends Metro Cluster capabilities‒ No distance limit between data centers

‒ No unified storage / network needed

• Storage replicated as active / passive‒ Leverage SUSE included data replication (DRBD)

‒ Integrate third-party solutions via scripts


Unlimited Geo Clustering

22

Create rescuemedia


Node Recovery Framework

PXEUSB

CD/DVD

Automated Recovery:● Partitioning● SW RAID / LVM● Formatting● Restore Data● Install Boot loader

FullBackup

Restore Boot rescuesystem

1

23

Existing backup facility

4

24

RAS

High Availability

Live Kernel

Patching

System Rollback

25

It is people

26

Goal: Go back to well-known system state

Reduce operational downtime, human mistakes

‒Patch installation, system upgrades

‒System admin tasks

‒Higher data integrity and availability

New on SUSE Linux Enterprise 12:

• Extended system integration (zypper, YaST)

• Support for Service Pack rollback

• Support for Kernel Upgrade → Full system rollback

System Rollback

27

System Rollback

Components

Grub2: boot loader integration for full system rollback

Snapper: GUI and CLI tool for easy snapshot/rollback

Btrfs: default filesystem with fault tolerance, repair, and easy management features

29

Avoid human mistakes YaST and autoYaST

The most efficient single-system management framework, with consistent UI

30

– Open source one-to-many system management

– Reduce errors by proactive and automated patching

– Complete life cycle management, compliance and security framework

SUSE ManagerSupport the full lifecycle

31

Customer Center

Managed Systems

Managed Systems

Management

Monitoring

ProvisioningAPI

Layer

IT Application

Custom Content

Web UI

Firewall

SUSE Manager

How Does SUSE Manager Work?

SUSE ManagerProxy

MS SCOM

Android App

CLI

SUSE ManagerServer

32

SUSE Manager

Package and Patch Management

33

SUSE Manager

Security and compliance

34

SUSE Manager

A recent use case: “ShellShock bug”

CVE Search

Quick view on impacts

Quick way to react

Easy reporting for compliance

35

RAS

High Availability

SystemRollback

Live KernelPatching

36

SUSE® Linux Enterprise Live Patching

Technology “kGraft”

• Live kernel Patching

• Designed and developed by SUSE Labs

• Ease of use: Builds on well known update processes

Currently being integrated “Upstream” (Kernel community)

• Technical differentiatiors

‒ Works with zero execution interruption

‒ As opposed to other approaches who stop the whole system (miliseconds to seconds range) when patching

37

SUSE® Linux Enterprise Live PatchingUse Cases

• Mission Critical systems‒ Improve general availability

‒ Fix security vulnerability

‒ Long running tasks (simulations, ….)

‒ Long restoring services (SAP HANA, Large Dbs ...)

‒ Run until the next “maintenance window”

‒ Help to meet SLAs

• Help with deployment challenges‒ No need to update all 10000+ systems at one shot, but be able to

run until a specific state is reached

• See also:https://www.suse.com/communities/conversations/need-kgraft-2/

To recap...

39

Reduce Planned Downtimewith SUSE

40

Reduce Unplanned Downtimewith SUSE

https://www.suse.com/communities/conversations/need-kgraft-2/

41

SummaryFour steps to go towards zero downtime:

1 Prevent hardware downtime‒ Reliability, Availability and Serviceability features

2 Maximize service availability‒ Clustering & Geo Clustering Technology

‒ Live kernel patching

3 Avoid human mistakes‒ Automated patching & security compliance

4 Quickly recover working system state‒ System snapshot & rollback

Thank you.

42

Questions?

Corporate HeadquartersMaxfeldstrasse 590409 NurembergGermany

+49 911 740 53 0 (Worldwide)www.suse.com

Join us on:www.opensuse.org

43

Backup Slides

45

LeadershipSUSE® Linux Enterprise High Availability Extension

• Long history track record

• Up-to-date Open Source High Availability stack

• Geo cluster support

• Superior Cluster File System

• Integrated Data Replication

• Full System z support

• Deep OS integration

• Ready for Virtualization

46

•Bootstrapping a cluster is really easy:

‒node1 # sleha-init -i bond0 -t ocfs2 -p /dev/sdb

‒nodeN # sleha-join -c 192.168.2.1

•Connect to the hawk web console for cluster management

Easy Setup – BootstrapSUSE® Linux Enterprise High

http://www.opensuse.org/

47

• Remote monitoring of resources

‒no HA components needed

‒ re-use of Nagios plugins

• Improved handling of virtual guests

‒monitor virtual services from the hypervisor

‒ improve protection of VMs as cluster workload

‒guests remain unaltered – monitoring is external

• Extends pacemaker to include the concept of “container” resources

Blackbox monitoring

Unpublished Work of SUSE. All Rights Reserved.This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE. Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.

General DisclaimerThis document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.

SUSE2012 Template v3: 6/20/12 - Novell · *Source: Forrester Research, Inc., The State Of Business...

Documents

Transcript of SUSE2012 Template v3: 6/20/12 - Novell · *Source: Forrester Research, Inc., The State Of Business...