SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is...

39
SUSE® Linux Enterprise High Availability Kai Dupke Senior Product Manager SUSE Linux Enterprise Server [email protected] Kristoffer Grönlund Senior Software Engineer HA Architect [email protected]

Transcript of SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is...

Page 1: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

SUSE® Linux EnterpriseHigh Availability

Kai Dupke

Senior Product Manager

SUSE Linux Enterprise Server

[email protected]

Kristoffer Grönlund

Senior Software Engineer

HA Architect

[email protected]

Page 2: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

SUSE High Availability:Easy, Quick, Anywhere

Page 3: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

● Overview

● High Availability

● Geo Cluster

● Roadmap

3

Topics

Page 4: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

ChallengeSUSE® Linux Enterprise High Availability

● Murphy’s Law is universal

● Faults will occur– Hardware crash, flood, fire, power outage, earthquake

● Service outage and loss of data– You might afford a five second blip, but can you afford a longer outage?

● Can you afford low availability?

Page 5: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

HA or no HA?

Page 6: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Quis custodit custodes?SUSE® Linux Enterprise High Availability

Reboot instead of failing over• (virtualized) hardware needs to be available

Re-deployment instead of failing over• Monitor needs to be always available

Farmed services• Client needs to handle fail-over ('F5', SMTP)

• 3rd party application must support scale-out

• Backend needs to be available

Page 7: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

SUSE® Linux EnterpriseHigh Availability

Page 8: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

OverviewSUSE® Linux Enterprise High Availability

Most modern and complete open source solution for high availability Linux clusters

A suite of robust open source technologies that is• Easy to use

• Integrated

• Virtualization agnostic

Used with SUSE Linux Enterprise Server, it helps to• Maintain business continuity

• Protect data integrity

• Reduce unplanned downtime for mission-critical workloads

Page 9: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

• Service Failover

• Cluster File Systems

• Clustered Samba

• Virtualization Agnostic

• Full support for x86, x86_64, POWER, and System z

• Network Load-Balancer

• Data Replication

• Node Recovery

• HAWK Web GUI

• Unlimited Geo Clustering

FeaturesSUSE® Linux Enterprise High Availability

SUSE unique!

Page 10: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

TargetsSUSE® Linux Enterprise High Availability

Quickly and easily install, configure and manage

Continuous access to mission-critical systems and data

Transparent to Virtualization

Meet Service Level Agreements

Increase service availability

Page 11: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Key Use Cases – mission-critical servicesSUSE® Linux Enterprise High Availability

Active/active services OCFS2, Databases, Samba File Servers

Active/passive service fail-over Traditional databases, SAP setups, regular services

High availability across guests Fine granular monitoring and HA on top of virtualization

Network Load-Balancing with transparent fail-over

All Topologies Local, Metro, and Geographical area clusters

Page 12: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Simple Stack Enqueue Replication

DRBD Data Sync HA in Virtual Environments

Sample Use Cases - SAPSUSE® Linux Enterprise High Availability

Page 13: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Pharmaceutical drugs & products

Part of STADA group

Running Highly Available SAP

Reference – Ciclum PharmaSUSE® Linux Enterprise High Availability

„SUSE Linux Enterprise offers the perfect combination of flexibility and reliability.”

„100 percent uptime for SAP since the solution is live.”

„The partnership between SUSE and SAP gave us confidence.”

“SUSE Linux Enterprise High Availability Extension gives us powerful tools.”

— ANTÓNIO DAMASIT Manager

Ciclum Farma

Page 14: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Geo Cluster

Page 15: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Cluster fail-over between different locations• Provide disaster resilience in case of site failure

• Each site is a self-contained, autonomous cluster

• Support manual and automatic switch-/fail-over

Extends Metro Cluster capabilities• No distance limit between data centers

• No unified storage / network needed

Storage replicated as active / passive• Leverage SUSE included data replication (DRBD)

• Integrate third-party solutions via scripts

Geo Cluster – OverviewSUSE® Linux Enterprise High Availability

Page 16: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Local cluster• Negligible network latency

• Typically synchronous concurrent storage access

Metro area (stretched) cluster• Network latency <15ms (~20mls)

• Unified / redundant network between sites

• Usually some form of replication at the storage level

Geo clustering• High network latency, limited bandwidth

• Asynchronous storage replication

Geo Cluster – From Local to GeoSUSE® Linux Enterprise High Availability

Page 17: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Geo Cluster – SetupSUSE® Linux Enterprise High Availability

Site A Site B

(Arbitrator)

boothd

Node 1 Node 2 Node 7 Node 8

Site C

boothd boothd

Page 18: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Service failover at any distance – from local to geo

Up to 99.9999% availability

Rolling updates for less planned downtime

Easy setup, administration, management

Virtualization agnostic

Leading open source High Availability

Fighting Murphy's Law

When will you start?

SummarySUSE® Linux Enterprise High Availability

Page 19: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Roadmap

Page 20: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

RoadmapSUSE® Linux Enterprise High Availability

SP1SP2SP3

2015 2017 2018

High Availability• Host based mirroring optimization• AWS cloud support

GEO Cluster• Virtualization for standard workloads

High Availability• HAWK GUI redesign• HA for POWER• md-cluster data mirroring

High Availability• Azure cloud support

GEO Cluster• Bootstrap support• Wizard support

SERVICE PACK 1 SERVICE PACK 2 SERVICE PACK 3

2016

This information is forward-looking and is subject to change at any time.

Page 21: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Recent Improvements – 12 SP1SUSE® Linux Enterprise High Availability

Hawk 2• Redesigned and updated interface

• Many new wizards

• Command log

Page 22: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

SUSE Linux High Availability 12Service Pack 2

Page 23: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

● Hawk 2 now default

● Hawk Batch Mode

● Pacemaker 1.15: Event-based Alerts

● Clustered RAID 1 (cluster-md)

● HAProxy 1.6

● AWS fencing agent, tool support

● Power LE

● UEFI support in ReaR

SUSE® Linux Enterprise High AvailabilityService Pack 2

26

Page 24: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Setup & Management

Page 25: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Easy to bootstrap

node1 # ha-cluster-init -i bond0 -t ocfs2 -p /dev/disk/by-id/...node[2...N] # ha-cluster-join -c node1

Web interface for cluster management & wizards

Easy Setup – Bootstrap & WizardsSUSE® Linux Enterprise High Availability

Page 26: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Hawk 2 – Batch Mode

Page 27: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Hawk 2 – History Explorer

Page 28: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

crm shell – Cluster Scripts

# crm script run virtual-ip id=admin ip=10.13.37.98

INFO: Virtual IP

INFO: Nodes: alice, bob

OK: Configure cluster resources

# crm cfg show admin

primitive admin IPaddr2 \

params ip=10.13.37.98 \

op start timeout=20 interval=0 \

op stop timeout=20 interval=0 \

op monitor interval=10 timeout=20

Page 29: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

● Shared device RAID-1● Avoid SAN as SPOF● High performance

See dedicated talk on cluster-md!

cluster-md

cluster-md

Page 30: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Outlook

Page 31: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Public Cloud● AWS / EC2● Azure

Geo Cluster● Bootstrap● Wizards

Interface / Tools● Hawk 2 - Fencing Topology● Hawk 2 – Alerts

Upcoming Improvements – 12 SP3SUSE® Linux Enterprise High Availability

This information is forward-looking and is subject to change at any time.

Page 32: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Future – Beyond the next FrontierSUSE Linux Enterprise High Availability

This information is forward-looking and is subject to change at any time.

Failure will occur

• How to predict & avoid failures?

Virtualization, Containers and Cloud

• Monitor from outside or inside the guests?

Local, Metro, Geo...

• What is the next new cluster scenario?

Scalability

• What is the right cluster size – 2 nodes, 20 nodes, 200 nodes, 2000+ nodes?

Usability

• What makes cluster deployment, operation, support easier?

Page 33: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Questions

Page 34: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Backup

Page 35: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

● Remote monitoring of resources

– no HA components needed

– re-use of Nagios/icinga plugins

● Improved handling of virtual guests

– monitor virtual services from the hypervisor

– improve protection of VMs as cluster workload

– guests remain unaltered – monitoring is external

● Extends pacemaker to include the concept of “container” resources

External Remote MonitoringSUSE Linux Enterprise High Availability

38

Page 36: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

● Core is a traditional cluster (up to 32 nodes)

● Core drives arbitrary number of remote nodes

– Remote nodes can be virtual or physical

● Remote management and monitoring

– Remote agent (pacemaker-remote) needed

– Uses resource agents & system init scripts

– More feature-rich than external monitoring

● Remote nodes can host (almost) all resources

– Exceptions: DLM, cLVM2, OCFS2, GFS2

Scale-out via Remote NodesSUSE Linux Enterprise High Availability

39

MasterNodes

Page 37: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Architecture

Page 38: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Cluster Software Stack

Corosync

Messaging / Infrastructure

Resource Allocation

Resource Agents

ResourceResourceResource

Resource

Local Resource Manager Local Resource

Manager

Cluster Resource Manager

Policy Engine Cluster Information Base (CIB)

CIB Replica Cluster Resource

Manager

Corosync

Designated Coordinator (DC)

CO

RO

SYN

CPA

CEM

AK

ERR

ESO

UR

CES

Page 39: SUSE Linux Enterprise · SUSE Linux Enterprise High Availability This information is forward-looking and is subject to change at any time. Failure will occur • How to predict &

Linux High Availability StackSUSE® Linux Enterprise High Availability Extension

The stack includes:• corosync – cluster infrastructure

• Pacemaker – cluster resource manager

• resource-agents – manage and monitor availability of services

• stonith – IO fencing support (also Xen and VMware VMs)

• Hawk – Web console for cluster monitoring and administration

• crm shell – Advanced cluster command line interface

• DRBD – network cluster storage

• cLVM – Cluster-aware LVM

• OCFS2, GFS2 – active/active file systems