Wednesday, March 18, 2009

Simple Cluster with Heartbeat

Following on from my previous post about setting up a reverse proxy in Fedora 10, I now delve into high availability. The plan here is to create two reverse proxy servers and cluster them together in an Active / Passive configuration with automatic fail-over.

So here is our trusty network diagram:


You will notice the shared IP and Proxy02 with connected to Proxy01 with the cross-over cable.

You need the cross-over cable for the heartbeat keep-alive messages. As I am using VMWare I have used the HOST ONLY network configuration for the second nics on the servers thus simulating a physical cross-over cable.


Here is the network configuration on each proxy server:

Proxy01:
eth0- ( same network and subnet as the client )
IP =192.168.0.11
SUBNET MASK=255.255.255.0
DEFAULT GW =192.168.0.1
HOSTNAME = proxy01.latham.internal

eth1 - ( different network and subnet as the client )
IP = 192.168.169.11
SUBNET MASK = 255.255.255.0
DEFAULT GW = none

Proxy02:
eth0- ( same network and subnet as the client )
IP =192.168.0.12
SUBNET MASK=255.255.255.0
DEFAULT GW =192.168.0.1
HOSTNAME = proxy02.latham.internal

eth1 - ( different network and subnet as the client )
IP = 192.168.169.12
SUBNET MASK = 255.255.255.0
DEFAULT GW = none

In my environment I have actually used DHCP and statically assigned leases for the eth0 nics on my proxy servers.

You will need to ensure the availability of a shared IP address for the eth0 nics. In my case I chose 192.168.0.10 No other server should own this ip address.

Next up we will install heartbeat on the proxy servers.
As usual in Fedora we leverage the excellent package manager ( YUM ) and install it:
# yum install heartbeat
Once heartbeat and all required dependencies are installed you will need to edit some configuration items. First lets start with the basics in preparation for Heartbeat managing the resources.

NOTE: At the time of writing this tutorial I have not worked out the required SELINUX directives so have turned SELINUX off. I recommend getting Heartbeat working with SELINUX turned on.
# setenforce 0

1. Ensure that the Apache web-service does not auto-start:
# chkconfig httpd off

2. Ensure that httpd will listen on the correct IP address. This will be the shared IP address. In my lab this is 192.168.0.10
# vim /etc/httpd/conf/httpd.conf
[ update the listen directive to read: ]
...
Listen 192.168.0.10
...

Heartbeat arrives totally unconfigured. You have to create 3 files in order to make it work. These files live in /etc/ha.d and are called:
  • ha.cf
  • haresources
  • authkeys
Here follow my examples: IMPORTANT: These are identical on both servers.

ha.cf
bcast eth1
keepalive 2
warntime 10
deadtime 30
initdead 120
udpport 694
crm no
auto_failback no
node proxy01.latham.internal
node proxy02.latham.internal

haresources
proxy01.latham.internal 192.168.0.10 apache::/etc/httpd/conf/httpd.conf

authkeys
auth 1
1 crc

The authkeys file must be made secure with:
# chmod 600 authkeys

The crc method of authenticating is fast but insecure. The insecurity of it is offset by the security of the cross-over cable. In a more paranoid or less secure environment you might consider either md5 or sha1.

Testing:
Make sure httpd is stopped on both nodes:
# service httpd stop
Start heartbeat on both nodes:
# service heartbeat start
Note that there might be a message stating that a resource is stopped. This will be becuase at the time of starting heartbeat the httpd resource was stopped. In a short while heartbeat will start httpd for you.
Now try to browse the example html page that you configured in your reverse proxy but with the shared ip address this time, and note in the access logs for proxy01 the traffic.
Try removing proxy01 from the cluster with
# service heartbeat stop
and note that in a short while proxy02 will take over the shared IP address and start httpd.

Other tests I have performed:
1. stop httpd on the primary node and watch it start up again after 30 seconds.
2. shut the primary node down altogether and watch the secondary node take over.
3. Bring the primary node back into the cluster and watch it take over the shared up and start httpd while stopping httpd on the secondary node. ( AUTO FAIL_BACK = yes ) - This feature seemed not to work. All my failback testing resulted in a failback regardless of this setting.

Other considerations:
1. sync your webserver files.
2. test ssl connections.
Post a Comment