Elastix 2.4 Ha Cluster-updated

28
1 Building Elastix-2.4 High Availability Clusters with DRBD and Heartbeat (using a single NIC) Credits Special thanks to Telesoft Integrando Technologies whose earlier documentation on doing DRBD in Elastix was a great reference for this work; http://asterisk.aplitel.info/files/asteriskcluster.pdf Another special thanks to Amjad Jabali for the Elastix 1.6.0 Clustering tutorial that this document is based on. http://ftp.heanet.ie/disk1/sourceforge/e/el/elastix-easy/Documents/Elastix-Clustering.pdf A final special thanks to the Escuela Politécnica del Ejercito (Sangolquí Ecuador) for the help provided by their staff and faculty members in the Elastix Project Overview. If by any way you can contribute to polish this work feel free to write to my e-mail and send me your documentation. Remember Open Source is based on community effort, so any recommendation to improve it is appreciated and valued. This information has been modified and updated by Nick Ross. Please refer to the original document found at: http://www.elastixbrasil.com.br/downloads/Elastix_2.3_HA_Cluster.pdf Changes made to this document will be explained at the very end in Appendix A. Document Last updated September 25th, 2013.

description

Elastix 2.4 Ha Cluster-updated

Transcript of Elastix 2.4 Ha Cluster-updated

Page 1: Elastix 2.4 Ha Cluster-updated

1

Building Elastix-2.4 High Availability Clusters with

DRBD and Heartbeat (using a single NIC)

Credits

Special thanks to Telesoft Integrando Technologies whose earlier documentation on doing DRBD in Elastix was a great reference for this work; http://asterisk.aplitel.info/files/asteriskcluster.pdf Another special thanks to Amjad Jabali for the Elastix 1.6.0 Clustering tutorial that this document is based on. http://ftp.heanet.ie/disk1/sourceforge/e/el/elastix-easy/Documents/Elastix-Clustering.pdf A final special thanks to the Escuela Politécnica del Ejercito (Sangolquí – Ecuador) for the help provided by their staff and faculty members in the Elastix Project Overview. If by any way you can contribute to polish this work feel free to write to my e-mail and send me your documentation. Remember Open Source is based on community effort, so any recommendation to improve it is appreciated and valued.

This information has been modified and updated by Nick Ross.Please refer to the original document found at:

http://www.elastixbrasil.com.br/downloads/Elastix_2.3_HA_Cluster.pdf

Changes made to this document will be explained at the very endin Appendix A.

Document Last updated September 25th, 2013.

Page 2: Elastix 2.4 Ha Cluster-updated

2

INDEX

Operational Overview…………………………………………………..….3 What Is DRBD………………………………………………….…....3 What Heartbeat does………………………………………….…....4 Equipment Overview……………………………………………………….4 DRBD Install and Configuration…………………………………………..5 Heartbeat Configuration………………………………………….…….…10 Credits…………………………………………………………………......12

References……………………………………………………………......12 Appendix A (changes made).....……………………………………......13Appendix B (Flash op panel fix)....…………………………………......14Appendix C (Update Elastix w/ DRBD)...............…………………......16Appendix D (Resize DRBD partition)...............………………….........18Appendix E (DRBD w/ 2 NICs)...............…………………...................23

Appendix F (TFTPBOOT on DRBD)................…………………..........24Appendix G (Three Nodes)....................…………………...................25

Page 3: Elastix 2.4 Ha Cluster-updated

3

Operational Overview

What is DRBD?

Page 4: Elastix 2.4 Ha Cluster-updated

4

failover failbackswitchover.

Equipment Overview

Page 5: Elastix 2.4 Ha Cluster-updated

yum –y update

Page 6: Elastix 2.4 Ha Cluster-updated

6

Primary (p) Partition number (3) Press enter until returned to fdisk command prompt NOTE: if your servers have two different sized hard drives it is imperative that the

third partition is identical in size or they will never synchronize over DRBD. Do this by accepting the default first cylinder and then specifying the Last cylinder with the +sizeM option. Ex. +6048M. Make these same specifications on both servers.

Press “t” to change the partition system ID Press “3” to choose partition number Choose HEX 83 for type Press “w” to save changes

RESTART SERVER

7. Format newly made partition

8. Now we delete the file system from the disk we just created

9. Install DRBD, Heartbeat and dependencies with yum.

Note: If by any chance you experience problems with drbd83, use drbd82 version (64 bit versions).

10. To ensure proper host name to IP resolution it is recommended that you manually update the /etc/hosts file to reflect proper host-to-IP mapping. Add the following:

11. Edit /etc/drbd.conf on Server1.drbd. Modify this sample to meet your particular needs.

192.168.1.243 voipserver.drbd192.168.1.242 voipbackup.drbd

global { usage-count no; }resource r0 {protocol C;startup { wfc-timeout 10; degr-wfc-timeout 30; } #change timers to your needdisk { on-io-error detach; } # or panic, ...net {

mke2fs -j /dev/sda3

dd if=/dev/zero bs=1M count=500 of=/dev/sda3; sync

yum install heartbeat drbd83 kmod-drbd83

Page 7: Elastix 2.4 Ha Cluster-updated

7

Note: The following lines are used to help the servers resolve split brain recovery. Split brain is when two servers are in primary mode and need to know how to resolve who should assume primary/secondary role (discarding or accepting changes made in primaries).

Reference: http://www.drbd.org/users-guide/s-configure-split-brain-behavior.html

12. Replicate this config file (/etc/drbd.conf) to the second server

13. Initialize the meta-data area on disk before starting drbd (! on both server!)

14. Start drbd on both nodes (service drbd start)

15. Verify that both server are secondary

after-sb-0pri discard-least-changes;after-sb-1pri discard-secondary;after-sb-2pri call-pri-lost-after-sb;cram-hmac-alg "sha1";shared-secret "Cent0Sru!3z";}syncer { rate 5M; }on voipserver.drbd {device /dev/drbd0;disk /dev/sda3;address 192.168.1.242:7788;meta-disk internal;}on voipbackup.drbd {device /dev/drbd0;disk /dev/sda3;address 192.168.1.243:7788;meta-disk internal;}}

after-sb-0pri discard-least-changes;after-sb-1pri discard-secondary;after-sb-2pri call-pri-lost-after-sb;

scp /etc/drbd.conf [email protected]:/etc/

drbdadm create-md r0

service drbd start

cat /proc/drbd

Page 8: Elastix 2.4 Ha Cluster-updated

16. As you can see, both nodes are secondary, which is normal. we need to decide which node will act as a primary now (voipserver.drbd) : that will initiate the first 'full sync' between the two nodes:

17. Launch the command and wait until it’s finish synchronizing

18. We can now format /dev/drbd0 and mount it on voipserver.drbd:

19. We can determine the role of a server by executing the following;

The primary server should return;

20. Now we will copy all of the directories we want synchronized between the two

servers to our new partition, remove the original directories and then create symbolic links to replace them on voipserver.drbd

cd /replica

amportal chown

tar -zcvf etc-asterisk.tgz /etc/asterisk

tar -zxvf etc-asterisk.tgz

tar -zcvf var-lib-asterisk.tgz /var/lib/asterisk

tar -zxvf var-lib-asterisk.tgz

tar -zcvf usr-lib-asterisk.tgz /usr/lib/asterisk/

tar -zcvf var-www.tgz /var/www/

tar -zxvf usr-lib-asterisk.tgz

tar -zcvf var-spool-asterisk.tgz /var/spool/asterisk/

tar -zxvf var-spool-asterisk.tgz

tar -zcvf var-lib-mysql.tgz /var/lib/mysql/

tar -zxvf var-lib-mysql.tgz

tar -zcvf var-log-asterisk.tgz /var/log/asterisk/

tar -zxvf var-log-asterisk.tgz

tar -zxvf var-www.tgz

rm -rf /etc/asterisk

rm -rf /var/lib/asterisk

rm -rf /usr/lib/asterisk/

rm -rf /var/spool/asterisk

rm -rf /var/www

drbdadm -- --overwrite-data-of-peer primary r0

watch -n 1 cat /proc/drbd

mkfs.ext3 /dev/drbd0

mkdir /replica

mount /dev/drbd0 /replica

drbdadm role r0

Page 9: Elastix 2.4 Ha Cluster-updated

21. Stop mysqld, asterisk and httpd services on voipserver.drbd

Now switch to the VOIPBACKUP server

22. Verify services are down and proceed to switch manually to the second server:

[[email protected] /]#

[[email protected] /]# [[email protected] /]#

Note: This is used to check if you are replicating information on both servers. You should see all data replicated in the secondary server just like data in the primary. * DO NOT perform this action with the physical terminal logged in. Use SSH. Otherwise, it will fail tounmount the /replica folder for some reason! Also make sure you are not IN the replica folder. Type "cd /" .

23. Verify voipserver.drbd status (Primary/Secondary)

24. Execute ‘df –h’ on the primary to confirm that our /dev/drbd0 partition is mounted and in use.

Note: Executing this same command in voipbackup.drbd while in secondary mode should not display the /dev/drbd0 partition unless it’s assuming primary mode. 25. Now we will remove and link on voipbackup.drbd

rm -rf /var/lib/mysql/

rm -rf /var/log/asterisk/

ln -s /replica/etc/asterisk/ /etc/asterisk

ln -s /replica/var/lib/asterisk/ /var/lib/asterisk

ln -s /replica/usr/lib/asterisk/ /usr/lib/asterisk

ln -s /replica/var/spool/asterisk/ /var/spool/asterisk

ln -s /replica/var/lib/mysql/ /var/lib/mysql

ln -s /replica/var/log/asterisk/ /var/log/asterisk

ln -s /replica/var/www /var/www

cd /

service mysqld restart

service mysqld stop

service asterisk stop

service httpd stop

service elastix-updaterd stop

service elastix-portknock stop

rm -rf /etc/asterisk

rm -rf /var/lib/asterisk

rm -rf /usr/lib/asterisk/

umount /replica ; drbdadm secondary r0

mkdir /replica ; drbdadm primary r0 ; mount /dev/drbd0 /replica

ls /replica/

drbdadm role r0

Page 10: Elastix 2.4 Ha Cluster-updated

10

26. Stop mysqld, asterisk and httpd services on voipbackup.drbd

27. Now switch back to the first server :

[[email protected] /]#

[[email protected] /]#

28. Drbd is working ... let's be sure that it will always be started:

Heartbeat Configuration

29. Remember to stop any boot up services on both servers that should be controlled by heartbeat. These services will be controlled by heartbeat on the server that is in control.

30. Let's configure a simple /etc/ha.d/ha.cf file on voipserver.drbd :

Now switch to the VOIPSERVER server

rm -rf /var/spool/asterisk

rm -rf /var/lib/mysql/

rm -rf /var/log/asterisk/

rm -rf /var/www

ln -s /replica/etc/asterisk/ /etc/asterisk

ln -s /replica/var/lib/asterisk/ /var/lib/asterisk

ln -s /replica/usr/lib/asterisk/ /usr/lib/asterisk

ln -s /replica/var/spool/asterisk/ /var/spool/asterisk

ln -s /replica/var/lib/mysql/ /var/lib/mysql

ln -s /replica/var/log/asterisk/ /var/log/asterisk

ln -s /replica/var/www /var/www

service mysqld restart

service mysqld stop

service asterisk stop

service httpd stop

service elastix-updaterd stop

service elastix-portknock stop

chkconfig drbd on

chkconfig asterisk off

chkconfig mysqld off

chkconfig httpd off

chkconfig elastix-updaterd off

chkconfig elastix-portknock off

service mysqld stop

service asterisk stop

service httpd stop

service elastix-portknock stop

service elastix-updaterd stop

debugfile /var/log/ha-debug

logfile /var/log/ha-log

umount /replica/ ; drbdadm secondary r0

drbdadm primary r0 ; mount /dev/drbd0 /replica

Page 11: Elastix 2.4 Ha Cluster-updated

11

31. Create also the /etc/ha.d/authkeys on voipserver.drbd:

32. Change permissions on the /etc/ha.d/authkeys file on voipserver.drbd:

33. Edit /etc/ha.d/haresources on voipserver.drbd: (It is two lines!!!!!!! Formating is

important). Replace the email addresses with your own, on the second line.

34. Start the heartbeat service on voipserver.drbd :

35. Replicate now the ha.cf, authkeys and haresources to voipbackup.drbd and start

heartbeat

[[email protected] ha.d]# scp /etc/ha.d/ha.cf /etc/ha.d/authkeys /etc/ha.d/haresources [email protected]:/etc/ha.d/

[[email protected] ha.d]# service heartbeat start

36. Configure heartbeat to initialize at boot on both server

37. Verify voipserver.drbd status (Primary/Secondary)

38. Execute ‘df –h’ on the primary to confirm that our /dev/drbd0 partition is mounted and in use.

NOTE: I've set auto_failback to off. This seems more appropriate to me.

use the following command on the current secondary to switch back:

sh /usr/lib/heartbeat/hb_takeover

logfacility local0

keepalive 2

deadtime 30

warntime 10

initdead 120

udpport 694

bcast eth0

auto_failback off

node voipserver.drbd

node voipbackup.drbd

chkconfig --add heartbeat

chkconfig heartbeat on

voipserver.drbd drbddisk::r0 Filesystem::/dev/drbd0::/replica::ext3 IPaddr::192.168.1.245/24/eth0/192.168.1.255 mysqld asterisk httpd elastix-updaterd elastix-portknockvoipserver.drbd MailTo::[email protected],[email protected]::DRBD/HA-ALERT

auth 1

1 sha1 MySecret

chmod 600 /etc/ha.d/authkeys

service heartbeat start

drbdadm role r0

Page 12: Elastix 2.4 Ha Cluster-updated

39. Test your work by creating a SIP extension or anything inside Elastix Web

Interface, then shut down your primary server while making a continuous ping to 192.168.1.245 (floating IP address) verifying it doesn’t lose connectivity. Make another change in the secondary server, turn your primary back on, and all changes should be kept intact.

Special Note: Any changes made to asterisk files should be done via web Interface ONLY. Do not attempt to upgrade Elastix version once finished the cluster or else it will write its own files again discarding links to the /replica directory. Troubleshooting:

Credits MaxiDistribuciones Cía. Ltda. www.grupomaxi.com.ec Escuela Politécnica del Ejército www.espe.edu.ec Telesoft Integrando Technologies Redfone References http://wiki.centos.org/HowTos/Ha-Drbd http://support.red-fone.com/downloads/elastix/Elastix HA Cluster.pdf http://danielaliaman.com/blog/files/phonecube/cluster/AsteriskCluster.pdf http://www.drbd.org/users-guide/s-configure-split-brain-behavior.html Author: Daniel Guevara (Updated to 2.4 by Nick R.) E-mail: [email protected]

12

tcpdump –i eth0:0 –s 1500 –w captura.pcap #capture traffic

mv captura.pcap /var/www/html #move file to web for download

Page 13: Elastix 2.4 Ha Cluster-updated

1

APPENDIX A

I changed a few things within the original 2.3 document. Below is my description andreasoning for all of these changes.-CHANGE 1 - I edited the instructions for only one network card. While its best in practice to use a second networkcard for DRBD, its not necessary or always practical. To implement this, I lowered the rate at which DRBD replicatesfrom 100M to 5M (40 Mbps). This is to avoid creating congestion of VoIP traffic on replication. After the initial sync,very little data will be transferred, so this is likely not to be an issue. Additionally, DRBD now uses the same IP address(192.168.1.242 & .243), rather than a separate subnet. Remember to change the IP addresses to those of your servers.the "/etc/ha.d/haresources" file conatins the IP address for the cluster, which should also be set by you._IF YOU WISH TO USE A SECOND NIC, you can easily do so by going to STEP 11 to edit drbd.conf and changingthe IP addresses in this file to match those of your dedicated NICs. You will also want to go to STEP 30, and changeha.cf to state "bcast eth1" instead of eth0.-CHANGE 2- In STEP 11 after-sb-0pri is set to "discard-least-changes" . I felt this was a better option for asterisk. Inthe event that a split brain occurs, it really doesn't matter who was primary last. What matters is who was the functioningserver for outgoing calls. The functioning server is likely to have had more changes made, and therefore have the mostcurrent configuration. This is important, because the sb2-pri setting refers back to the sb-0pri setting.-CHANGE 3 - Being logged into the physical console/terminal caused Step 22 to fail. I made an entry in the notes.-CHANGE 4 - Step 28 was changed to "chkconfig drbd on" . It previously had "chkconfig drbd83 on" . This commanddid not work for me on 2.4, but my command did work-CHAGE 5 - Step 30 was changed to set "auto_failback" to off. Every time the server fails over or fails back, service isinterrupted. Personally, I'd like to investigate the reason for the failover before I switch back. If the backup server fails,you will still be able to automatically failover to the original primary server, so there is little risk to this. On the other hand,automatically failing BACK to the original server could cause constant interruptions if it had a bouncing NIC or the serverkeeps restarting due to a hardware issue._Manual failback can be completed by the command "sh /usr/lib/heartbeat/hb_takeover" onwhichever server is secondary, or "sh /usr/lib/heartbeat/hb_standby" on whatever server iscurrently the primary.-CHANGE 6 - Step 36 was edited to add "chkconfig heartbeat on" . Heartbeat did not automaticallystart until this command was added.-CHANGE 7 - Step 33 added "elastix-updaterd elastix-portknock" to haresources file. The services were also added toSteps 21,26 and 29. They rely on /var/www resources, so they can't be loaded at boot.-CHANGE 8 - Step 20 & 25 added "rm -rf /var/www" as it was missing from previous documents-CHANGE 9 - Step 33, added a line for email notifications.-CHANGE 10- Step 20, added in "amportal chown". This is our last chance to do this and ensure ownership, as it does not workonce the files are moved to the drbd database. Otherwise, we could be stuck without ownership.-- Appendix and Changes by Nick R.

Page 14: Elastix 2.4 Ha Cluster-updated

APPENDIX B

Flash Operator Panel Fix

The Flash Operator Panel is dependent on a process in order to receive updates made in FreePBX/Elastix.This process is normally loaded on boot, and called within the /etc/init.d/rc.local file. It requires files in/var/www/ to successfully launch. Unfortunately,since we are using DRBD, the clustered data holds /var/wwwand is not available on boot. We must configure heartbeat to launch this process, but first we must makea Linux Standard Base service script. YOU MUST DO THIS ON BOTH THE PRIMARY AND THE SECONDARYSERVERS!-This only matters if you actually use the flash operator panel. If you don't, then there is little reason to do this.-Step 1 - Type the following command on VOIPSERVER (which I'll assume is in the PRIMARY role)nano /etc/init.d/fop_start-Step 2 - In the editor, copy the fop_start script, then save & exit (CTRL+O then CTRL+X). The fop_start script is foundon the following page (sorry, it didn't fit on this page!).-Step 3 - Type the command to set the appropriate permissionschmod 755 /etc/init.d/fop_start-Step 4 - Edit the /etc/ha.d/haresources file ( you can type "nano /etc/ha.d/haresources") and add "fop_start" to the endof the first line. It should now look like this:

Step 5- You can test to make sure the fop_start service is working by typing:service fop_start startservice fop_start statusservice fop_start stop-This should produce a bunch of messages about amportal.conf. It then should end with an "OK" for the stop and thestart command. If it does not, then your current server might be the secondary (in which case a "FAILED" result isto be expected). To check if the current server is the primary, type "drbdadm role r0".-Step 6 - REPEAT STEPS 1-4 ON THE OTHER SERVER (VOIPBACKUP)! Step 5 will result in a "FAILED" messageuntil it assumes the primary role. Running "sh /usr/lib/heartbeat/hb_takeover" will make you assume the primary role.

voipserver.drbd drbddisk::r0 Filesystem::/dev/drbd0::/replica::ext3 IPaddr::192.168.1.245/24/eth0/192.168.1.255 mysqld asterisk httpd elastix-updaterd elastix-portknock fop_startvoipserver.drbd MailTo::[email protected],[email protected]::DRBD/HA-ALERT

Page 15: Elastix 2.4 Ha Cluster-updated

1

#!/bin/bash# description: FOP daemon# process name: fop_start# Author: Nick Ross. /etc/init.d/functions[ -f /usr/sbin/amportal ] || exit 0FOP="/usr/sbin/amportal"FXO="/usr/sbin/fxotune"RETVAL=0getpid() {

pid=`ps -eo pid,comm | grep "op_server.pl" | awk '{ print $1 }'`}start() {

echo -n $"Starting FOP: "getpidif [ -z "$pid" ]; then

$FXO -s >dev/null$FOP start_fop > /dev/nullRETVAL=$?

fiif [ $RETVAL -eq 0 ]; then

touch /var/lock/subsys/fop_startecho_success

elseecho_failure

fiechoreturn $RETVAL

}stop() {

echo -n $"Stopping FOP: "getpidRETVAL=$?if [ -n "$pid" ]; then

$FOP stop_fop > /dev/nullsleep 1getpidif [ -z "$pid" ]; then

rm -f /var/lock/subsys/fop_startecho_success

elseecho_failure

fielse

echo_failurefiechoreturn $RETVAL

}# See how we were called.case "$1" instart)

start;;

stop)stop;;

status)getpidif [ -n "$pid" ]; then

echo "FOP (pid $pid) is running..."else

RETVAL=1echo "FOP is stopped"

fi;;

restart)stopstart;;

*)echo $"Usage: $0 {start|stop|status|restart}"exit 1;;

esacexit $RETVAL

Script for /etc/init.d/fop_start

Page 16: Elastix 2.4 Ha Cluster-updated

1

APPENDIX C

Updating Asterisk With DRBD

Step 1 - Keep data on both servers by shutting down one server. We want to make sureonly one server is on at a time. I shut down "voipbackup", and leave "voipserver" on.-Step 2 - Stop Services on the active machine, and make sure they always stay off (heartbeat manages them)chkconfig asterisk offchkconfig mysqld offchkconfig httpd offservice mysqld restartservice mysqld stopservice asterisk stopservice httpd stopservice elastix-updaterd stopservice elastix-portknock stopservice fop_start stop-Step 3 - Remove links from the /replica folder to the file systemunlink /etc/asteriskunlink /var/lib/asteriskunlink /usr/lib/asteriskunlink /var/spool/asteriskunlink /var/lib/mysqlunlink /var/log/asteriskunlink /var/www/wwwunlink /var/www-Step 4 - Copy all the files in the /replica/ folder back to the root partitionmkdir /etc/asteriskmkdir /var/lib/asteriskmkdir /usr/lib/asteriskmkdir /var/spool/asteriskmkdir /var/lib/mysqlmkdir /var/log/asteriskmkdir /var/wwwcp -arf /replica/etc/asterisk/* /etc/asterisk/cp -arf /replica/var/lib/asterisk/* /var/lib/asterisk/cp -arf /replica/usr/lib/asterisk/* /usr/lib/asterisk/cp -arf /replica/var/spool/asterisk/* /var/spool/asterisk/cp -arf /replica/var/lib/mysql/* /var/lib/mysql/cp -arf /replica/var/log/asterisk/* /var/log/asterisk/yes | cp -arf /replica/var/www/* /var/www/

Page 17: Elastix 2.4 Ha Cluster-updated

1

STEP 5 - Remove everything in the /replica folder.rm -rf /replica/*-STEP 6 - Run the yum update for elastix, then "amportal chown" to ensure proper ownership after the update:yum -y update elastix*amportal chown-STEP 7- Recopy and link everything to the /replica folder.

cd /replica

tar -zcvf etc-asterisk.tgz /etc/asterisk

tar -zxvf etc-asterisk.tgz

tar -zcvf var-lib-asterisk.tgz /var/lib/asterisk

tar -zxvf var-lib-asterisk.tgz

tar -zcvf usr-lib-asterisk.tgz /usr/lib/asterisk/

tar -zcvf var-www.tgz /var/www/

tar -zxvf usr-lib-asterisk.tgz

tar -zcvf var-spool-asterisk.tgz /var/spool/asterisk/

tar -zxvf var-spool-asterisk.tgz

tar -zcvf var-lib-mysql.tgz /var/lib/mysql/

tar -zxvf var-lib-mysql.tgz

tar -zcvf var-log-asterisk.tgz /var/log/asterisk/

tar -zxvf var-log-asterisk.tgz

tar -zxvf var-www.tgz

rm -rf /etc/asterisk

rm -rf /var/lib/asterisk

rm -rf /usr/lib/asterisk/

rm -rf /var/spool/asterisk

rm -rf /var/lib/mysql/

rm -rf /var/log/asterisk/

rm -rf /var/www

ln -s /replica/etc/asterisk/ /etc/asterisk

ln -s /replica/var/lib/asterisk/ /var/lib/asterisk

ln -s /replica/usr/lib/asterisk/ /usr/lib/asterisk

ln -s /replica/var/spool/asterisk/ /var/spool/asterisk

ln -s /replica/var/lib/mysql/ /var/lib/mysql

ln -s /replica/var/log/asterisk/ /var/log/asterisk

ln -s /replica/var/www /var/www

cd /

Step 8 - Shut down the Primary server (voipserver)-Step 9- Start up secondary (voipbackup), run the commandsh /usr/lib/heartbeat/hb_takeover-Step 10 - Repeat Steps 1-7 on voipbackup-Step 11 - Boot up the "voipserver", then once its up for a few minutes and synced,restart "voipbackup". Use the command "service drbd status" to check syncing. If youdid everything right, you will have an updated elastix/asterisk system, and the DRBDdate will reconcile.

Page 18: Elastix 2.4 Ha Cluster-updated

APPENDIX D

RESIZING OR MOVING the DRBD Partition

This is a very long and drawn out process, and is not for the weak of heart. The concept is similar to the previous section,where we move everything off the DRBD partition, and then back on. If you've changed the partition, or want to specify adifferent size than what I have, then make the changes accordingly. Please ensure you perform the steps on the correctserver.-%%%%% ON VOIPBACKUP (#2) %%%%%STEP 1 - We have to make sure that VOIPBACKUP will not start DRBD or HEARTBEAT on reboot.chkconfig heartbeat offchkconfig drbd off<Shut down the VOIP BACKUP server>-@@@@ ON VOIPSERVER (#1) @@@@Step 2 - We must turn all the services off, unlink the folders in replica, then copy the data back to the root partitionchkconfig asterisk offchkconfig mysqld offchkconfig httpd offservice mysqld restartservice mysqld stopservice asterisk stopservice httpd stopservice elastix-updaterd stopservice elastix-portknock stopservice fop_start stop-unlink /etc/asteriskunlink /var/lib/asteriskunlink /usr/lib/asteriskunlink /var/spool/asteriskunlink /var/lib/mysqlunlink /var/log/asteriskunlink /var/www/wwwunlink /var/wwwmkdir /etc/asteriskmkdir /var/lib/asteriskmkdir /usr/lib/asteriskmkdir /var/spool/asteriskmkdir /var/lib/mysqlmkdir /var/log/asteriskmkdir /var/www-cp -arf /replica/etc/asterisk/* /etc/asterisk/cp -arf /replica/var/lib/asterisk/* /var/lib/asterisk/cp -arf /replica/usr/lib/asterisk/* /usr/lib/asterisk/cp -arf /replica/var/spool/asterisk/* /var/spool/asterisk/cp -arf /replica/var/lib/mysql/* /var/lib/mysql/cp -arf /replica/var/log/asterisk/* /var/log/asterisk/yes | cp -arf /replica/var/www/* /var/www/amportal chown 18

Page 19: Elastix 2.4 Ha Cluster-updated

1

@@@@ Still on VOIPSERVER #1 @@@@Step 3- We must now turn the services off, temporarily prevent them from starting up, and remove & recreate theDRBD partition. When partitioning please type in the commands manually. It will not copy and paste in, and youneed to be sure you are using the right partitions. Make sureyou've read the information and are answering correctly.If you are changing the partition or drive, you will need to make modifications accordingly.-chkconfig heartbeat offchkconfig drbd offservice heartbeat stopservice drbd stop-fdisk /dev/sdapd3-pnp<press enter>+10000M (or whatever size you desire)-t383w-STEP 4- Reboot VOIPSERVER. Start up VOIPBACKUP.-%%%%% ON VOIPBACKUP (#2) %%%%%STEP 5- We must now force the backup server to a DRBD primary role, and remount the replica folderservice drbd startdrbdadm primary r0 ; mount /dev/drbd0 /replica-STEP 6 - We must perform the copy from the replica folder, back to the root partition, as we did on the other server.unlink /etc/asteriskunlink /var/lib/asteriskunlink /usr/lib/asteriskunlink /var/spool/asteriskunlink /var/lib/mysqlunlink /var/log/asteriskunlink /var/www/wwwunlink /var/www-(STEP 6 is continued on the next page)

Page 20: Elastix 2.4 Ha Cluster-updated

2

(STEP 6 is continued here)--mkdir /etc/asteriskmkdir /var/lib/asteriskmkdir /usr/lib/asteriskmkdir /var/spool/asteriskmkdir /var/lib/mysqlmkdir /var/log/asteriskmkdir /var/wwwcp -arf /replica/etc/asterisk/* /etc/asterisk/cp -arf /replica/var/lib/asterisk/* /var/lib/asterisk/cp -arf /replica/usr/lib/asterisk/* /usr/lib/asterisk/cp -arf /replica/var/spool/asterisk/* /var/spool/asterisk/cp -arf /replica/var/lib/mysql/* /var/lib/mysql/cp -arf /replica/var/log/asterisk/* /var/log/asterisk/yes | cp -arf /replica/var/www/* /var/www/amportal chown-STEP 7 - We must now end the DRBD service, and repartition VOIPBACKUP, as we did on the previous serverservice drbd stop-fdisk /dev/sdapd3-pnp<press enter>+10000M (or whatever size you desire - just make sure its the SAME SIZE ON BOTH SERVERS!)-t383w-STEP 8 - Reboot VOIPBACKUP. VOIPSERVER should also be on and running.-!!!!!!!! ON BOTH VOIPSERVER AND VOIPBACKUP !!!!!!STEP 9- Perform these commands on VOIPSERVER first, then perform them again on VOIPBACKUP. You mustdo this on BOTH machines. We run "drbdadm create-md r0" to ensure that the drbd data is a fresh rewrite and notthe old meta data. This ensures it uses the full size of the new partition. Also, do not cut and past "<enter yes>".-mke2fs –j /dev/sda3dd if=/dev/zero bs=1M count=500 of=/dev/sda3; syncdrbdadm create-md r0<enter yes>drbdadm create-md r0service drbd start

Page 21: Elastix 2.4 Ha Cluster-updated

@@@@ ON VOIPSERVER (#1) @@@@STEP 10 - On voipserver only, issue the following commands. Do not copy and paste the second line. It is aninstruction for you to wait. Use "service drbd status" to check if you are 100% synced.-drbdadm -- --overwrite-data-of-peer primary r0(wait for the systems to sync 100% - check with service drbd status)mkfs.ext3 /dev/drbd0mkdir /replicamount /dev/drbd0 /replica-STEP 11 - We are now going to take the data on VOIPSERVER, and use it to fill the /replica folder. We will thenunmount the replica, put VOIPSERVER in a secondary status, and then set the services to be active when we reboot.DO NOT REBOOT VOIPSERVER YET.-cd /replicatar -zcvf etc-asterisk.tgz /etc/asterisktar -zxvf etc-asterisk.tgztar -zcvf var-lib-asterisk.tgz /var/lib/asterisktar -zxvf var-lib-asterisk.tgztar -zcvf usr-lib-asterisk.tgz /usr/lib/asterisk/tar -zcvf var-www.tgz /var/www/tar -zxvf usr-lib-asterisk.tgztar -zcvf var-spool-asterisk.tgz /var/spool/asterisk/tar -zxvf var-spool-asterisk.tgztar -zcvf var-lib-mysql.tgz /var/lib/mysql/tar -zxvf var-lib-mysql.tgztar -zcvf var-log-asterisk.tgz /var/log/asterisk/tar -zxvf var-log-asterisk.tgztar -zxvf var-www.tgzrm -rf /etc/asteriskrm -rf /var/lib/asteriskrm -rf /usr/lib/asterisk/rm -rf /var/spool/asteriskrm -rf /var/lib/mysql/rm -rf /var/log/asterisk/rm -rf /var/wwwln -s /replica/etc/asterisk/ /etc/asteriskln -s /replica/var/lib/asterisk/ /var/lib/asteriskln -s /replica/usr/lib/asterisk/ /usr/lib/asteriskln -s /replica/var/spool/asterisk/ /var/spool/asteriskln -s /replica/var/lib/mysql/ /var/lib/mysqlln -s /replica/var/log/asterisk/ /var/log/asteriskln -s /replica/var/www /var/wwwcd /-umount /replica ; drbdadm secondary r0-chkconfig heartbeat onchkconfig drbd on

Page 22: Elastix 2.4 Ha Cluster-updated

2

%%%% ON VOIPBACKUP (#2) %%%%%STEP 12 - On VOIPBACKUP, we are now going to assume the primary role and mount the replica partition.Once this is done we remove the original folders, create the soft links, and then set the services to turn on after areboot.-mkdir /replica ; drbdadm primary r0 ; mount /dev/drbd0 /replica-rm -rf /etc/asteriskrm -rf /var/lib/asteriskrm -rf /usr/lib/asterisk/rm -rf /var/spool/asteriskrm -rf /var/lib/mysql/rm -rf /var/log/asterisk/rm -rf /var/wwwln -s /replica/etc/asterisk/ /etc/asteriskln -s /replica/var/lib/asterisk/ /var/lib/asteriskln -s /replica/usr/lib/asterisk/ /usr/lib/asteriskln -s /replica/var/spool/asterisk/ /var/spool/asteriskln -s /replica/var/lib/mysql/ /var/lib/mysqlln -s /replica/var/log/asterisk/ /var/log/asteriskln -s /replica/var/www /var/www-chkconfig heartbeat onchkconfig drbd on-STEP 13 (FINAL STEP) - Reboot VOIPBACKUP first, and then reboot VOIPSERVER. At this point, the servers willcome back up, and one should assume the PRIMARY role, and begin replicating correctly. Your new DRBD partitionshould be sized correctly ( run "df -h" to check this on the primary ).-If you have auto_failback set to off, and you do not like which server assumed the primary role, then switch towhichever server is secondary and type:-sh /usr/lib/heartbeat/hb_takeover--This will set that server as the DRBD primary node. Whew. That was a lot of work. I hope you never have to do this.It was a complete pain in the ass to both do and to document. The better strategy is to give yourself plenty of spacein the beginning.

Page 23: Elastix 2.4 Ha Cluster-updated

APPENDIX E

Using Two Network Interface Cards

The original guide that this was based on used a dual NIC configuration. Using a separate NIC for DRBD is usually agood idea. You want the replication to work through an independent channel, typically where bandwidth for yourservices will not be starved by DRBD replication traffic. There are many situations, however, where this is notnecessarily desirable. For instance, if you use virtual machines, there wouldn't be much point in adding a secondvirtual NIC. There is also the case of having an elastix based appliance, with only a single NIC built in, and a secondNIC is unavailable.-Asterisk servers are usually not very disk sensitive, so its unlikely DRBD will be sending much data. If you are worriedabout DRBD using too much bandwidth, then you can lower the replication rate limit below 5 MB/s (which is 40 Mbits).-If you wish to use two network cards, the changes to the original implementation are very simple. In this scenario, wewill assume VOIPSERVER has a secondary NIC eth1, with an IP address of 10.1.1.1 . We will assume VOIPBACKUPhas a secondary NIC eth1, with an IP address of 10.1.1.2 .-Note: the ip addresses do not need to be in a different. The can be whatever you assign them to. Just make sure theyare static addresses, or you have set a DHCP reservation for these NICs.-!!!!!!!!! MAKE THE CHANGES ON BOTH SERVERS!!!!!Step 1- Change the following section of the /etc/drbd.conf:

syncer { rate 5M; }on voipserver.drbd {device /dev/drbd0;disk /dev/sda3;

address 10.1.1.1:7788;meta-disk internal;}on voipbackup.drbd {device /dev/drbd0;disk /dev/sda3;address 10.1.1.2:7788;meta-disk internal;}}-Step 2- Change the following section of the /etc/ha.d/ha.cf file-warntime 10initdead 120udpport 694bcast eth1auto_failback offnode voipserver.drbdnode voipbackup.drbd-Step 3 - Reboot, both servers and you are done.

Page 24: Elastix 2.4 Ha Cluster-updated

APPENDIX F

Moving TFTPBOOTThe TFTP boot can be easily moved by copying it to the replica folder, deleting the original folder, and then creating a symboliclink to the replica folder. As you can see, it is very similar to the procedure we've used for all other folders.Step 1- Go on the active/primary computer and type the following commandscd /replicatar -zcvf tftpboot.tgz /tftpboot/tar -zxvf tftpboot.tgzrm -rf /tftpbootln -s /replica/tftpboot /tftpbootStep 2- Go on the secondary computer and type these commandsrm -rf /tftpbootln -s /replica/tftpboot /tftpbootThat is all there is to it!

Page 25: Elastix 2.4 Ha Cluster-updated

APPENDIX G

Inserting a passive third node (no HA)While DRBD and Heartbeat do not offer the ability to fail over to a third node, you can set up a third node for passive replication. It

will copy everything on the current DRBD cluster, and always be a secondary node. There are many complications with this process,

and the only real practical use is for disaster recovery. That being said, this is an order of magnitude more difficult than the two node

cluster, and this documentation is probably going to be partially incomplete. I had trouble getting this working, so my documentation

was not as effective. Hopefully, this will at least guide you.

-

Just a note - in order to do this, you basically have to recreate the DRBD setup on your clustered partitions. If you already

are using DRBD, follow the guide in Appendix D, to the point where you fdisk your partitions. This will copy your asterisk

configs back to both servers so you do not lose your copies.

-

We have 3 servers. VOIPSERVER, VOIPBACKUP and VOIPDR (our passive node for disaster recovery).

VOIPDR will be 192.168.1.244.

-

Step 1- Follow the main guide for steps 1-9 on all servers.

-

Step 2- For step 10 on the original guide, on all servers, you will need to add this line in addition to the others in step 10:

192.168.1.244 voipdr.drbd

-

Step 3- For Step 11 on the original guide, on all servers, add the following to drbd.conf AFTER the very last bracket:

resource r1 {

protocol A;

net

{

}

syncer { rate 5M; }

stacked-on-top-of r0 {

device /dev/drbd10;

address 192.168.1.245:7788;

}

on voipdr.drbd {

device /dev/drbd10;

disk /dev/sda3;

address 192.168.1.244:7788;

meta-disk internal;

}

}

Step 4- Continue with the original guide, we do not deviate until after step 17.

-

Step 5- Perform step 28 & 29 in the original guide, on ALL SERVERS.

-

Step 6 - We now must configure heartbeat temporarily. This is much earlier than in the original guide, but it is required

to even get the third node up. Perform steps 30,31 and 32 on VOIPSERVER and VOIPBACKUP. There will be

NO HEARTBEAT on VOIPDR... EVER. If you try to do this on VOIPDR you will break everything.

Page 26: Elastix 2.4 Ha Cluster-updated

Step 7- Instead of step 33, we must provide a very different /etc/ha.d/haresources file. The haresources file must

now have only the single following line and NO OTHERS:

-

voipserver.drbd IPaddr::192.168.1.245/24/eth0

-

Step 8 - You may now follow steps 34-36 on the following guide. Remember, VOIPDR should not have heartbeat running,

so you do not need to replicate these steps to VOIPDR.

-

Step 9 - Now is where we really deviate. We must make resource r1 on top of r0. Basically, we are putting a second

layer on ourexisting DRBD, which we will replicate with VOIPDR.

-

Type the following We will start on the primary, which should be VOIPSERVER.

#####VOIPSERVER#####

drbdadm --stacked create-md r1

drbdam --stacked up r1

drbdadm --stacked adjust r1

-

Step 10 - Now we prepare VOIPDR, but in a slightly different way

%%%%%% ON VOIPDR %%%%%%

drbdadm create-md r1

service drbd start

drbdadm adjust r1

-

Step 11- Now we must have VOIPSERVER assume the primary role for stacked resource r1, and force a replication

####### ON VOIPSERVER ########

drbdadm --stacked -- --overwrite-data-of-peer primary r1

-

You should wait until this has finished before continuing. Use "service drbd status" to check the replication progress.

Please note, we have not touched VOIPBACKUP at all in this process... and that is ok. It is not necessary. All of this

data will automatically replicate over to VOIPACKUP if VOIPSERVER fails (or it will after we finalize the heartbeat settings).

-

Step 12 - Once this is completed, we can mount the resource on VOIPSERVER. Run the following commands:

####### ON VOIPSERVER #########

mkfs.ext3 /dev/drbd10

mkdir /replica

mount /dev/drbd10 /replica

-

Notice that we are using /dev/drbd10 instead of drbd0. drbd10 is the device that represents stacked resource r1, which is on top

of resource r0 and /dev/drbd0. All three device will be using the stacked resource r1 & /dev/drbd10 to mount the data, this way

the data is accessible to all three nodes. If we used r0 & drbd0, VOIPDR would never receive any replicated data.

-

Step 13- We now must move the asterisk/elastix configuration onto our mounted volume. Still on ##VOIPSERVER##,

follow STEP 20 from the original guide. This will only work on VOIPSERVER.

Page 27: Elastix 2.4 Ha Cluster-updated

Step 14- Now we are going to out in the real heartbeat configuration. The configuration we had before was only temporary,and it was needed to simply bring VOIPDR into the replication. The following is a copy of the ha resources file you shouldhave. It needs to match EXACTLY. This must be done on BOTH, VOIPSERVER and VOIPBACKUP. Do NOT do this onVOIPDR.

voipserver.drbd IPaddr::192.168.1.245/24/eth0/192.168.1.255 drbdupper::r1 Filesystem::/dev/drbd10::/replica::ext3 mysqld asterisk httpd elastix-updaterd elastix-portknock

voipserver.drbd MailTo::[email protected],[email protected]::DRBD/HA_MAIN-NODE

Please note, if you have implemented the flash operator panel fix (Appendix B), please make the appropriate changes.-Step 15- At this point, we need to start preparing VOIPBACKUP, which is a little tricky. With a stacked resource its a littlemore complicated to unmount and switch primaries... but first things first. Lets put a /replica/ foldero n VOIPBACKUP.$$$$$$$$ ON VOIPBACKUP $$$$$$$$$$$mkdir /replica-Step 16- At this point, the easiest way to finish up the process is to reboot and have VOIPBACKUP take the primary rolevia heartbeat. Reboot both ##VOIPSERVER## and $$VOIPBACKUP$$, but give $$VOIPBACKUP$$ a 30 second headstart.If VOIPBACKUP does not come up in the primary role, you can run "sh /usr/lib/heartbeat/hb_takeover" on it.-Step 17- The /replica/ folder should be populated now on $$VOIPBACKUP$$. If so, you can complete the transfer byfollowing step 25 on the original guide. Once you are done, you can give ##VOIPSERVER## control by typing in"sh /usr/lib/heartbeat/hb_takeover" on that server.-THE GUIDE IS OVER. STOP HERE.THE INSTRUCTIONS BELOW ARE FOR DISASTER RECOVERY ONLY!!!!!DO NOT FOLLOW TRY TO FOLLOW THESE STEPS UNLESS YOU HAVE TO.- You may be wondering why you've done so little on the third node, VOIPDR. The short answer is that VOIPDR doesn't do much.Once you set up DRBD on VOIPDR, it just sits there and replicates. There is no heartbeat, no failover, etc. If something everwere to happen to the other two nodes, VOIPDR would have to be activated manually.-Assuming some disaster, where the other two nodes caught fire and burned, here is a quick rundown of what you'd do.%%%%% ON VOIPDR - BUT ONLY IN THE EVENT OF A DISASTER %%%%%mkdir /replicadrbdadm primary r1mount /dev/drbd10 /replica-(FOLLOW STEP 25 ON THE ORIGINAL GUIDE, THEN CONTINUE)-service mysqld startservice asterisk startservice httpd startservice elastix-updaterd startservice elastix-portknock startfxotune -samportal start_fop-Now, edit the phone configurations so that they connect to VOIPDR. Its not nearly as elegant, but it will work. Alternatively,instead of STEP 25, you can just delete the original folders, then copy them from the /replica/ folder back to their "real"locations. After you do this, you can stop drbd ("service drbd stop"), and change the ip address of the server to that of theold cluster (192.168.1.245). The phones will now automatically find the server and resume operations.

Page 28: Elastix 2.4 Ha Cluster-updated

I hope you found these documents useful.

Sincerely,

Nick Ross. of TheServerExpert.com and CTC.

Please contact me - nick (at) theserverexpert.com

if you have any comments or find any mistakes with the guide.