SAP High Availability in Azure Using SUSE Linux

29
1 SAP High Availability in Azure Using SUSE Linux [BP-1404]

Transcript of SAP High Availability in Azure Using SUSE Linux

1

SAP High Availability in Azure Using SUSE Linux

[BP-1404]

2

Agenda

Basic SAP Architecture

SAP HA Architecture in Azure

Pacemaker

Azure Load Balancer

Demo of unplanned failover

3

SAP Architecture

4

Basic SAP architecture

Application

Server

Database

ServerShared Disk

Central

Services

Database

Storage

5

S/4HANA High Availability Architecture in Azure

6

Availability Set (99.95%) Availability Zones (99.99%)

7

SUSE High

Availability

Extension

9

The Goal of HA

MTTR

10

SUSE High Availability Overview

corosync (cluster membership)

pacemaker (crm)

Resource Agents (RAs)

Fencing (stonith)

Kernel Kernel

SAP SAPSAP

Storage

(SBD)

vIP vIP

14

Resource Agents

Provides ‘intelligence to Pacemaker’

A script used to start/stop/monitor a resource

• Ideally should be Open Cluster Framework compliant

• Well defined return values

• Mandatory operations

• Return value passed back to Pacemaker

• Many providers of RAs

• Ships with around 140 RA out of the box.

• Resource Agents for SAP HANA included in SLES for SAP Applications

15

SAP HANA Resource Agents

16

Why Do We Need Fencing?

To a cluster node, loss of a peer node is indistinguishable from loss of

communication with that node.

In the former case, is it safe to failover resources?

And in the latter case?

17

Split Brain

• When a cluster partitions due to network failure

• Neither side knows if the other is still alive

• Worst case scenario: each side attempts to failover the other's resource

• Better scenario: neither side does anything

(But then, why do we have a cluster?)

• Best scenario: one side is able to guarantee that the other is down

• Fencing is about moving from an UNKNOWN state to a KNOWN state

18

SUSE High

Availability with

SAP Central

Services

19

Enqueue-Replication Versions

ENSA2ENSA1

20

Central Services – Multi SID

21

Architecture options for SAP on Azure

File system

• BYO SUSE cluster

• ANF

• NFS (future)

Az availability

options

• Av Set

• Av Zone

Fencing agent

• SBD

• Azure Fencing agent (future)

22

Floating IP: Two Basic Architectures Possible

Hana

1

Hana

2

PAS

Azure Load

Balancer

Hana

1

Hana

2

PAS

Floating IP

23

Let’s look at the first case

Hana

1

Hana

2

PAS

Floating IP

Hana

System

Replication

Sr_takeover

• “Floating” IP can be moved from

one machine to another via

API/CLI

• IP Move takes approximately 2

minutes

24Backend Pool

Load Balancer

Hana

1

Hana

2

Client

Azure Load

Balancer

25

Hana

1

Hana

2

PAS

Floating IP

Hana

System

Replication

Sr_takeover

Azure Load

Balancer

Health

Probe

Health

Probe

26

SOCAT & Virtual IP Network Resource

sudo crm configure primitive rsc_ip_HN1_HDB03 ocf:heartbeat:IPaddr2 \ meta target-

role="Started" is-managed="true" \ operations \$id="rsc_ip_HN1_HDB03-operations" \

op monitor interval="10s" timeout="20s" \ params ip="10.0.0.13"

sudo crm configure primitive rsc_nc_HN1_HDB03 anything \ params

binfile="/usr/bin/socat" cmdline_options="-U TCP-

LISTEN:62503,backlog=10,fork,reuseaddr /dev/null" \ op monitor timeout=20s

interval=10 depth=0

sudo crm configure group g_ip_HN1_HDB03 rsc_ip_HN1_HDB03 rsc_nc_HN1_HDB03

28

Unplanned Failover

Several mechanisms for testing:

• Shut down machine from Azure

portal

• ps aux | grep sbd, kill inquisitor

• service pacemaker stop

29

Takeaways

Read the Documentation

Setup & test your configuration and

keep testing

Understand the operations

Monitoring & Alerts

30

Resources

Links to documentation

https://documentation.suse.com/sbp/all/

Links to automation

https://github.com/SUSE/ha-sap-

terraform-deployments

Training & certification

https://training.suse.com/training/sap/

Azure training & certifications

[TUT-1226]

SAP HA on SUSE: All you need to know

[TUT-1396]

"Day 2" Operations of SAP HANA Cluster using SUSE High Availability on Public Cloud

[HOL-1064]

SAP HANA scale-out with high availability NFS using DRBD

[BP-1351]

SUSE High Availability for SAP HANA: Tales from the real world, tips, tricks, & troubleshooting

[HOL-1225]

High Availability for SAP application servers using ENSA2 enqueue replication.

31

32

33