SAP HANA Disaster Recovery with SUSE High Availability

66
SAP HANA Disaster Recovery with SUSE High Availability Extension Cleber Paiva de Souza / Gabriel Cavalcante {cleber,gabriel}@ssys.com.br S-SYS Systems and Solutions

Transcript of SAP HANA Disaster Recovery with SUSE High Availability

SAP HANA Disaster Recovery with SUSE High Availability Extension

Cleber Paiva de Souza / Gabriel Cavalcante

{cleber,gabriel}@ssys.com.br

S-SYS Systems and Solutions

2

S-SYS and SUSE

• S-SYS officially born in

Jan/2014

• SUSE partner since

beginning

• Formed by professionals

with experience in SUSE

products, Linux in

general, training and

software development

• Acting together with

SUSE engineers in pre-

sales and project delivery

3

Fujitsu Brazil Case: Sap HANA Appliances

• Fujitsu offers Primergy

RX600 S6 and

PRIMEQUEST machines

with SAP HANA.

• SAP HANA HA studies took

place at Fujitsu Platform

Solution Center (PSC).

• Integration between S-SYS

and Fujitsu teams.

• Knowledge transfer allowed

Fujitsu to delivery SAP

HANA integrated with

SUSE High Availability.

4

Fujitsu RX600 hardware specification

• Up to 4x Powerful Intel® Xeon®

processors E7 family

• Expandable to 1 TB of DDR3-

RAM with mirroring support

• Robust I/O design and 10 PCI

Express slots

• 8x hard drive bays and support

up to 8 TB local storage

• Integrated Remote Management

Controller (iRMC) providing

advanced management features

• 4U rack server form factor

5

Fujitsu Rx600 real hardware specification

• 4 Intel® Xeon® E7 4870 2.4

GHz. (10 cores * 2 threads * 4

sockets = 80 cores)

• 1 TB of RAM

• 2 x Fusion-IO PCI cards (1.2

GB in RAID 1)

• 8 x 900 GB SAS 10000 RPMs

in RAID5

• 6 x 10GE Network interfaces

• 6 x 1GE Network interfaces (4

onboard and 2 PCI)

6

Fujitsu Primequest PRIMEQUEST Hardware Specification

Highlights of the new generation

8-Socket Server with up to 4 independent HW partitions and flexible I/O based on latest Intel® Xeon E7-x800v3

Maximum performance by new Intel Haswell-EX processor generation with up to 18 cores

Increased memory capacity and performance by 192x DDR4 DIMM slots with 1866Mhz

System self repair with flexible-IO and reserved-SB functionality 12Gbps RAID controller with 1/2GB cache

All parts are redundant and/or hot swappable Enhanced ServerView management

Improved Enterprise RAS feature set

Product facts

up to 8x Intel Xeon E7-x800v3 (Haswell-EX)

Up to 12 TB RAM (using 192 x 64GB) Up to 24 x 2.5” HDD/SSD´s Up to 16 PCIe slots internal Additional 48 PCIe hot plug slots in 4 x ext. PCI Box

Up to 8 x 10GbE internal with 4 x IOUF

7

HANA in SLES HAE ClusterHANA Single Box – System Replication / Scale-up

8

Considerations

• We are Linux experts not SAP experts.

• SLES for SAP = SLES 11 + HA extension + SAP support + SAPHanaSR.

• All tests done on SLES for SAP 11 SP3.

• Two-node clusters only (Scale-up / single-box replication).

• AUTOMATED_REGISTER=“false”

• By default SAP HANA instances are not started during boot. Cluster take care of services.

• Synchronous system replication.

• SAP HANA SPS 08 release 85.

9

Definitions

Parameter Value

Cluster node 1 hana01

Cluster node 2 hana02

SID HDB

Instance number 00

User key slehaloc

User Password

hdbadm P@ssword1

sapadm P@ssword1

SYSTEM P@ssword1

slehaloc Password1

10

Problem

• SAP HANA System Replication on SLES for SAP

Applications from June/2014 did not provide

configurations for IPMI as a STONITH resource.

• SAPHanaSR Hawk template does not provide IPMI

configuration.

Setup SAP HANA

12

Procedures for setup

1) Install SLES for SAP

2) Configure network interfaces

3) Configure NTP and timezone

4) Setup disk layout

5) Check hostnames and IP addresses

6) Install SAP HANA database

7) Setup HANA

8) Configure SLES HA Extension

9) Testing takeover

10) Stress test

1) Install SLES for SAP

14

Install SLES for SAP

• Install SUSE as usual

– Select pattern SAP HANA Server Base.

– SLES for SAP install minimal network services.

• Register at SUSE Customer Center (SCC) and apply

updates to prevent well-known problem and bugs.

– Updates size ~ 500MB

• SAPHanaSR is available only on SLES for SAP:

– Provide SAPHanaTopology and SAPHana resource agents

– Provide Hawk wizard templates

2) Configure network interfaces

16

Configure network interfaces

• Define how your network communication will work.

• Define interfaces for user access, heartbeat, data

replication, SAP remote support, STONITH, IPMI etc.

17

Network throughout and redundancy

• Use bonding for aggregation or redundancy. (802.3ad,

balance-rr, active-backup etc)

• Make use of 10GE network interfaces and Infiniband

56 GB.

• High Availability requires redudant paths for network

switches, fibre switches, Infiniband switches etc.

• Monitor your environment.

3) Configure NTP and timezone

19

Configure NTP and timezone

• All nodes must be in time sync.

• Cluster could fail if clock are skewed.

• Make use of the same timezone on all nodes or SAP

could misbehavior.

• Trace events on logs

could be hard.

4) Setup disk layout

21

Setup disk layout

22

Data throughput and redundancy

• Put /hana/log on Fusion-IO for performance.

– 7200 RPMs SATA = ~100 IOPS

– 15000 RPMs SAS = ~200 IOPS

– SSD disks = ~20,000 IOPS

– Fusion-IO = ~140.000 IOPS

• /hana/data e /hana/shared on some RAID layout (1,

5, 6 etc).

5) Check hostname and IP Addresses

24

Check hostnames and IP addresses

• Hostname should be defined before starting SAP

HANA installation.

– SAP HANA stores this information on sapstart service profiles

– Altering hostame after installation will require some changes

in files such as

/usr/sap/<SID>/HDB<instance_number>/<hostname>/sapp

rofile.ini

• Check /etc/hosts consistency.

– All nodes must know other nodes’ IPs to hostname mapping.

• Assign a virtual IP and hostname for the master node

in cluster.

6) Install SAP HANA Database

26

Install SAP HANA Database

27

Install SAP HANA Database

28

Install SAP HANA Database

29

Install SAP HANA Database

30

Install SAP HANA Database

31

Install SAP HANA Database

32

Install SAP HANA Database

33

Install SAP HANA Database

34

Install SAP HANA Database

35

Install SAP HANA Database

36

Install SAP HANA Database

37

Install SAP HANA Database

7) Setup HANA

39

Setup HANA (part I)

• Create user for data synchronization on all nodes:

# export PATH="$PATH:/usr/sap/HDB/HDB00/exe"

# hdbsql -u system -i 00 'CREATE USER slehasync PASSWORD Password1'

# hdbsql -u system -i 00 'GRANT DATA ADMIN TO slehasync'

# hdbsql -u system -i 00 'ALTER USER slehasync DISABLE PASSWORD LIFETIME'

• Set password on all nodes:

# hdbuserstore SET slehaloc localhost:30015 slehasync Password1

40

Setup HANA (part II)

• Verify user creation on all nodes:

# hdbuserstore list

DATA FILE : /root/.hdb/hana01/SSFS_HDB.DAT

KEY SLEHALOC

ENV : localhost:30015

USER: slehasync

• Verify query working without asking for password on

all nodes:

# hdbsql -U slehaloc "select * from dummy"

DUMMY

"X"

1 row selected (overall time 2733 usec; server time 115 usec)

41

Setup HANA (part III)

• Defining primary node with user hdbadm:

hana01:/usr/sap/HDB/HDB00> hdbnsutil -sr_enable --name=SITE001

checking for active nameserver ...

nameserver is active, proceeding ...

successfully enabled system as system replication source site

done.

• Verifying node state with user hdbadm:

hana01:/usr/sap/HDB/HDB00> hdbnsutil -sr_state

checking for active or inactive nameserver ...

System Replication State

~~~~~~~~~~~~~~~~~~~~~~~~

mode: primary

site id: 1

site name: SITE001

Host Mappings:

~~~~~~~~~~~~~~

done.

42

Setup HANA (part IV)

• Do first backup:

hana01:~ # hdbsql -u system -i 00 "BACKUP DATA USING FILE ('backup')"

Password:

0 rows affected (overall time 46.986124 sec; server time 46.984819 sec)

• Verify replication status:

hana01:~ # hdbsql -U slehaloc 'select distinct REPLICATION_STATUS from

SYS.M_SERVICE_REPLICATION’

REPLICATION_STATUS

0 rows selected (overall time 1701 usec; server time 401 usec)

43

Setup HANA (part V)

• Define secondary node with user hdbadm:

hana02:/usr/sap/HDB/HDB00> hdbnsutil -sr_register --remoteHost=hana01 --

remoteInstance=00 --mode=sync --name=SITE002

adding site ...

checking for inactive nameserver ...

nameserver hana02:30001 not responding.

collecting information ...

updating local ini files ...

done.

• Check secondary node status with user hdbadm:

hana02:/usr/sap/HDB/HDB00> hdbnsutil -sr_state

checking for active or inactive nameserver ...

System Replication State

~~~~~~~~~~~~~~~~~~~~~~~~

mode: sync

site id: 2

site name: SITE002

active primary site: 1

44

Setup HANA (part VI)

• Check primary node status with user hdbadm:

hana01:/usr/sap/HDB/HDB00> hdbnsutil -sr_state

checking for active or inactive nameserver ...

System Replication State

~~~~~~~~~~~~~~~~~~~~~~~~

mode: primary

site id: 1

site name: SITE001

Host Mappings:

~~~~~~~~~~~~~~

hana01 -> [SITE001] hana01

hana01 -> [SITE002] hana02

done.

8) Configure SLES HA Extension

46

Configure SLES HA Extension

• Install pattern “High Availability”.

• Install package SAPHanaSR.

• sleha-init on first node.

• Change /etc/corosync/corosync.conf if necessary.

– udp (multicast) vs udpu (unicast).

– Enable redundant channel and rrp mode.

– Enable security auth.

• sleha-join on second node.

• Keep STONITH disabled during configuration.

47

HA Configuration

• Default / global properties

property $id="cib-bootstrap-options" \

no-quorum-policy="ignore" \

stonith-action="poweroff"

rsc_defaults $id="rsc-options" \

resource-stickiness="1000" \

migration-threshold=3 \

failure-timeout=60

op_defaults $id="op-options" \

timeout="600”

48

HA Configuration

• SAPHanaTopology:

primitive rsc_SAPHanaTopology_HDB_HDB00

ocf:suse:SAPHanaTopology \

params SID="HDB" InstanceNumber="00" \

op monitor interval="10" timeout="600" \

op start interval="0" timeout="600" \

op stop interval="0" timeout="300"

clone cln_SAPHanaTopology_HDB_HDB00

rsc_SAPHanaTopology_HDB_HDB00 \

meta is-managed="true" clone-node-max="1"

interleave="true"

49

HA Configuration

• SAPHana:

primitive rsc_SAPHana_HDB_HDB00 ocf:suse:SAPHana \

params SID="HDB" InstanceNumber="00" PREFER_SITE_TAKEOVER="yes" AUTOMATED_REGISTER="true" DUPLICATE_PRIMARY_TIMEOUT="7200" \

op start interval="0" timeout="3600" \

op stop interval="0" timeout="3600" \

op promote interval="0" timeout="3600" \

op monitor interval="60" role="Master" timeout="700" \

op monitor interval="61" role="Slave" timeout="700" \

meta target-role="Started"

ms msl_SAPHana_HDB_HDB00 rsc_SAPHana_HDB_HDB00 \

meta clone-max="2" clone-node-max="1" interleave="true"

order ord_SAPHana_HDB_HDB00 2000: cln_SAPHanaTopology_HDB_HDB00 msl_SAPHana_HDB_HDB00

50

HA Configuration

• Virtual IP:

primitive rsc_ip_HDB_HDB00 ocf:heartbeat:IPaddr2 \

params ip="10.30.1.1" iflabel="0" \

op start interval="0" timeout="20" \

op stop interval="0" timeout="20" \

op monitor interval="10" timeout="20"

colocation col_saphana_ip_HDB_HDB00 2000:

rsc_ip_HDB_HDB00:Started msl_SAPHana_HDB_HDB00:Master

51

HA Configuration

• STONITH IPMI:

primitive stonith_ipmi_hana01 stonith:external/ipmi \

params hostname="hana01" ipaddr="172.16.1.1" userid="admin"

passwd="admin" \

op monitor enabled="true" interval="300" start-delay="5"

timeout="20"

location stonith_ipmi_hana01_not_on_hana01 stonith_ipmi_hana01

-inf: hana01

primitive stonith_ipmi_hana02 stonith:external/ipmi \

params hostname="hana02" ipaddr="172.16.1.2" userid="admin"

passwd="admin" \

op monitor enabled="true" interval="300" start-delay="5"

timeout="20"

location stonith_ipmi_hana02_not_on_hana02 stonith_ipmi_hana02

-inf: hana02

52

Hawk template

• Created custom Hawk template including IPMI as

STONITH. Available at

http://www.ssys.com.br/susecon/tut20056/hawk-

template.tar.gz.

53

Hawk template

54

Hawk template

55

Hawk template

9) Testing takeover

57

Manual takeover

• Secondary become primary with user hdbadm:

hana02:/usr/sap/HDB/HDB00> hdbnsutil -sr_takeover

checking local nameserver ...

done.

• Verify new state with user hdbadm:

hana02:/usr/sap/HDB/HDB00> hdbnsutil -sr_state

checking for active or inactive nameserver ...

System Replication State

~~~~~~~~~~~~~~~~~~~~~~~~

mode: primary

site id: 2

site name: SITE002

Host Mappings:

~~~~~~~~~~~~~~

hana02 -> [SITE001] hana01

hana02 -> [SITE002] hana02

done.

58

Cluster takeover

• Set AUTOMATED_REGISTER=”true”.

• Take attention to STONITH. Prefer shutdown instead

of reboot.

• Take attention to timeout (start, stop, migration etc)

9) Stress test

60

Stress test

• Detect problem during stress.

• Most of time due to lower timeout.

• HanaStress (https://github.com/Centiq/HanaStress)

hanastress.py -v --host localhost -i 00 -

u SYSTEM -p P@ssword1 -g anarchy --tables

100 --rows 100000 --threads 10

(This will create 100 tables with 100000 rows of

information each, using 10 threads)

61

Cleanup after stress test

• Remove database fragmentation:

– ALTER SYSTEM RECLAIM DATAVOLUME 120 DEFRAGMENT

– ALTER SYSTEM RECLAIM LOG

• Force flushing log data to disk:

– ALTER SYSTEM SAVEPOINT

62

References

• https://www.suse.com/docrep/documents/wvhlogf37z/

sap_hana_system_replication_on_sles_for_sap_appli

cations.pdf

• http://scn.sap.com/docs/DOC-60318

• http://scn.sap.com/docs/DOC-60374

• http://scn.sap.com/docs/DOC-60368

63

Thank you.

Going further

www.ssys.com.br

65

+49 911 740 53 0 (Worldwide)www.suse.com

Corporate Headquarters

Maxfeldstrasse 590409 NurembergGermany

Join us on:www.opensuse.org

Unpublished Work of SUSE LLC. All Rights Reserved.

This work is an unpublished work and contains confidential, proprietary, and trade secret information of SUSE LLC.

Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of

their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated,

abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE.

Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.

General Disclaimer

This document is not to be construed as a promise by any participating company to develop, deliver, or market a

product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making

purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document,

and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose.

The development, release, and timing of features or functionality described for SUSE products remains at the sole

discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at

any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in

this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All

third-party trademarks are the property of their respective owners.