Welcome to docs.opsview.com

High availability with Heartbeat and DRBD on Debian

Configuration is based on Linux-HA software

More information: http://www.linux-ha.org/

Author: Philipp Noack

Note: If you clone the first server after setup you will NOT get it running. The second server has to be installed exactly like the first one!


1. Debian install : My setup was:

	/boot ext3 with 100 MB and boot-flag
	/ ext3 with 5 GB
	swap wi th4 GB (depending on memory)
	/var ext3 with 5 GB
 	/var2 with the rest of the space was setup but wasn't formatted, yet!

2. Install bigmem-kernel (in case of +4GB memory)

type "apt-cache search linux-image bigmem" and choose the right kernel 

3. Network config: /etc/network/interfaces (eth1 will be for DRBD (RAID over TCP/IP) / heartbeat)

auto lo eth0 eth1

iface lo inet loopback

iface eth0 inet static
address 172.30.86.???
netmask 255.255.255.0
network 172.30.86.0
broadcast 172.30.86.255
gateway 172.30.86.1

iface eth1 inet static
address 192.168.1.???
netmask 255.255.255.0
network 192.168.1.0
broadcast 192.168.1.255

4. Install heartbeat:

	aptitude install heartbeat

5. Install DRBD: Add Debain Backports for DRBD8 in sources.list (included since lenny)

deb http://www.backports.org/debian etch-backports main contrib non-free

Install packages : drbd8-source, drbd8-utils

	aptitude -t etch-backports install drbd8-source
	aptitude -t etch-backports install drbd8-utils

Create kernel module (has to be redone if the kernel will be updated in future)

	module-assistant auto-install drbd8

reboot with the new kernel.

6. Edit the DRBD config (Official documentation: http://www.drbd.org/docs/install/). Here is my config as example : Important: You will find the name of the var2 partition in /etc/fstab. Mine was /dev/cciss/c0d0p7.

global {
    usage-count yes;
}

common {
  syncer { rate 700000K; }
}

resource r0 {

  protocol C;

  handlers {
    pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";

    pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";

    local-io-error "echo o > /proc/sysrq-trigger ; halt -f";

    outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
  }

  startup {
    degr-wfc-timeout 120;    # 2 minutes.
  }

  disk {
    on-io-error   detach;
    no-disk-flushes;
  }

  net {
    after-sb-0pri disconnect;

    after-sb-1pri disconnect;

    after-sb-2pri disconnect;

    rr-conflict disconnect;
  }

  syncer {
    al-extents 257;
  }
  on mbops01 {
    device    /dev/drbd0;
    disk      /dev/cciss/c0d0p6;
    address   192.168.1.1:7788;
    meta-disk internal;
 }
  on mbops02 {
    device    /dev/drbd0;
    disk      /dev/cciss/c0d0p6;
    address   192.168.1.2:7788;
    meta-disk internal;
  }
}

Customize the rights for DRBD:

	chgrp haclient /sbin/drbdsetup
	chmod o-x /sbin/drbdsetup
 	chmod u+s /sbin/drbdsetup

 	chgrp haclient /sbin/drbdmeta
 	chmod o-x /sbin/drbdmeta
	chmod u+s /sbin/drbdmeta

7. Initialize DRBD (on both machines):

	drbdadm create-md r0

check it with “cat /proc/drbd”

Warning: Do this set on the master server ONLY!

	drbdadm -- --overwrite-data-of-peer primary r0

8. Create filesystem /dev/drbd0 (only on the master server again):

	mkfs -t ext3 /dev/drbd0

9. Install OPSView incl. apache2 (or see the official debian documentation under http://docs.opsview.org/doku.php?id=opsview2.14:debian-installation): Add following lines to the sources.list:

	deb http://apt.opsview.org/debian etch main
	deb http://ftp.debian.org/debian etch non-free

Then do a “apt-get update” and “apt-get install opsview”.

I just quote the original docu : “Once Opsview has been installed, a Catalyst web server should be listening on port 3000. The Apache web server can then be used as a proxy to make Opsview available on port 80 (http) ñ this also provides a significant improvement in performance as static content is then served directly by apache rather than via the perl Catalyst web server.”

	apt-get install libapache2-mod-proxy-html
	a2enmod proxy 
	a2enmod proxy_http
	a2enmod proxy_html
	/etc/init.d/apache2 force-reload

10. Remove opsview + Mysql + Apache from the runlevels to start automatically at startup (heartbeat does it for us now):

	update-rc.d -f opsview remove
	update-rc.d -f opsview-web remove
	update-rc.d -f mysql remove
	update-rc.d -f apache2 remove
	update-rc.d -f opsview-agent remove

11. Configure heartbeat: /etc/ha.d/ha.cf

debugfile /var/log/ha-debug
logfile /var/log/ha-log

keepalive 2
deadtime 30
warntime 10
initdead 120
	
auto_failback off

bcast eth1

# This is a ping test in our network to check which server can ping it
ping 172.30.86.4

node mbops01
node mbops02

respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster

The file /etc/ha.d/haresources:

mbops01 drbddisk::r0 Filesystem::/dev/drbd0::/var2::ext3 172.30.86.170 mysql opsview opsview-web apache2

The file /etc/ha.d/authkeys:

auth 3
3 md5 anypassword

The set the filerights:

	chmod 600 /etc/ha.d/authkeys

11. Moving data:

	cd /usr/local/
	tar cvzf nagios.tar.gz nagios 
	mv nagios.tar.gz /var2
	rm -r nagios
	cd /var2
	tar xvzf nagios.tar.gz /var2
	ln -s /var2/nagios /usr/local/nagios

same with /usr/local/opsview-web same with /var/lib/mysql

12. Replace NRPE agents (to be done on both machines in primary mode) The opsview-agent needs the var2 partition to run, so you need to use another NRPE agent. Install NRPE server and plugins

	apt-get install nagios-nrpe-server nagios-plugins-basic
	rm /etc/nagios/nrpe.cfg
	cp /var2/nagios/etc/nrpe.cfg /etc/nagios/nrpe.cfg

Now you need to edit the paths in the nrpe.cfg, /usr/local/nagios/libexec is replaced by /usr/lib/nagios/plugins

	vim /etc/nagios/nrpe.cfg
	/etc/init.d/nagios-nrpe-server restart

13. Solving problems

- Disk-flush errors on RAID systems Add the line “no-disk-flushes;” into the drbd.conf:

resource r0
	disk {
	no-disk-flushes;
	...

- Apache2 proxy doesn't work:

	cp /usr/share/doc/opsview/apache2-proxy.conf /etc/apache2/sites-available/opsview
	ln -s /etc/apache2/sites-enabled/opsview /etc/apache2/sites-available/opsview

Customize the config files (remove comments and customize IPs). Do this on both machines, then just do a takeover to restart apache2 (hearbeat).

- MySQL doesn't start/stop on a node: /etc/mysql/debian.cnf passwords have to match

- Delete filesystem if there are problems with it

dd if=/dev/zero bs=1M count=1 of=/dev/cciss/????; sync

- Problems with ressources r1, r2 … Delete the line 'after “r2”;' in the drbd.conf

Navigation
Print/export
Toolbox