Tuesday, March 9, 2010

HOW-To: Highly Available NFS Server using drbd and heartbeat on Debian (Lenny) 5.0

I did this setup a few days ago. Just for fun. So thought, its a good idea to document this.

Author - Vishal Sharma
Created - March 9,2010
Version - 1.0
Disclaimer - This HOW-To is provided as is and it comes with Absolutely No Warranty of any kind. Use it at your own risk. These steps worked for me. So likely, they should work for you too. In case you run into any problems, please feel free to leave your comments below. I will try to address them as soon as i can.
Copyright (c) 2010 Vishal Sharma
Permission is granted to copy, distribute and/or modify the content ofvthis page under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is available at http://www.gnu.org/licenses/fdl.html


HOWTO: Setup highly available NFS Server using drbd and heartbeat

Here's a summary of what you need to do:

Step 1. Do your System Preparation. This involves setting up your basic hardware and software. (this is not covered as part of this HOW-To)

Step 2. Setting up the necessary ip addresses.

Step 3. Install the necessary packages

Step 4. Edit the relevant configuration files.

Step 5. Starting your system.

Step 6. Testing it to make sure that it actually works.

STEP 1. Doing your System Preparation.

I am using 3 Debian systems for this. All of them are a standard default install. However, i did make a change to boot my system to text mode. Since i don't have such a great hardware on my machine, i'd rather prefer to play around with the command-line.
Each system has an onboard network. Usually in production systems, this might not be the case. But since mine is a just a play-around install, i didn't care to add stuff that would give better performance.
If you don't want to dedicate a box, you can always use Sun VirtualBox for this and have your setup. Its much easier this way, as it offers you the benefit of taking system snapshots before you make any significant config changes.
My test setup here is based on 2 server and one client. In my scenario this client system is just a normal NFS client mounting shares from the NFS server. My setup here is Active/Passive and NOT Active/Active. So at any given time, your active system fails, the passive system will take over.
On server01 and server02, i have 2 partitions, namely
/dev/hdb1 - 500GB - which will be nfs mounted as /data
/dev/hdc1 - 500MB - this will store drbd meta-data. It needs a minimum of 128MB.
Make sure you don't mount any of these partitions. This would be handled by drbd. Make sure you just make the partitions and just leave them as is. No format, no mounts for now.


STEP 2. Setting Up the necessary IP Addresses.

Here's how i have named my systems:

server01 - 192.168.1.2/24 (eth0)
server02 - 192.168.1.3/24 (eth0)
client01 - 192.168.1.4/24 (eth0)

Make sure you have the above entries in yours /etc/hosts file so that name resolution is not a problem. This is probably the best option and you wouldn't want to get into a hassle of setting up a DNS Server for this.

192.168.1.5/24 would be my virtual floating IP Address. This is the address that will be seen by the outside world and will float around from system to system as their state changes.
My default gateway is 192.168.1.1/24, which is my router. I would need internet access for package installs.

Step 3. Install the necessary packages

Most important thing for the HA cluster to function well is - THE TIME. The time on the systems should be the same. To ensure that, make sure you have ntp packages in place.

server01# apt-get install ntp ntpdate

perform the above on server01 and server02.

On the Server i.e. server01 and server02 you would need the following packages installed to get up and running.
drbd8, kernel header files, nfs-server install and heartbeat. So here's what i have done.

server01# apt-get update
server01# apt-get install linux-headers-`uname -r` drbd8-utils drbd8-source heartbeat nfs-kernel-server

(the above should take a while to complete)

After the install completes, make the kernel drbd module using the following command

server01# m-a a-i drbd8-source

this will perform a compile and make the drbd kernel module.

server01# modprobe drbd
server01# lsmod | grep drbd
(this should show you something, if this doesnt give you anything, then there is some problem somewhere that needs fixing)


Disable nfs to start at boot time. This is done because nfs startup and shutdown is handled by drbd and we don't want the system to interfere.

server01# update-rc.d -f nfs-kernel-server remove
server01# update-rc.d -f nfs-common remove

Perform the same steps on server02 as well.

Step 4. Create / Edit the relevant configuration files.

these are the files you need to edit on server01 and server02

to handle NFS exports - /etc/exports - (on server01 and server02)

/data/export 192.168.1.0/255.255.255.0(rw)

for drbd configuration - /etc/drbd.conf (on server01 and server02)

drbd.conf

global {
usage-count yes;
}
common {
syncer { rate 10M; }
}

resource r0 {
protocol C;
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
}

startup {
degr-wfc-timeout 120; # 2 minutes.
}

disk {
on-io-error detach;
}

net {
}

syncer {
rate 10M;
al-extents 257;
}

on server01 {
device /dev/drbd0;
disk /dev/hdb1;
address 192.168.1.2:7788;
meta-disk /dev/hdc1[0];
}

on server02 {
device /dev/drbd0;
disk /dev/hdb1;
address 192.168.1.3:7788;
meta-disk /dev/hdc1[0];
}
}

for heartbeat - /etc/ha.d/ha.cf - (on server01 and server02)

ha.cf

logfacility local0
keepalive 1
deadtime 10
bcast eth1
auto_failback on
node server01 server02

setup heartbeat authentication - /etc/heartbeat/authkeys - (on server01 and server02)

auth 3
3 md5 your_password

Instead of using md5, you can also use sha1. Don't forget to make this file read/write to root only

chmod 600 /etc/heartbeat/authkeys - (on server01 and server02)

for making the ha resource available - /etc/ha.d/haresources - (on server01 and server02)

server01 IPaddr::192.168.1.100
server01 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 nfs-kernel-server

(NOTE: for server02, in the haresources file, you must have noticed that i have used the same hostname i.e. server01. This is to make sure that when server01 is available online, server02, will hand back the control to it and it would turn to being secondary. Pls make the above hostname change to server02, if you want it to remain as primary.)

After you have done all of the above, give these commands to initialize drbd meta-disk

(on server01 and server02)

server01# drbdadm create-md r0 (this is the resource name)
server01# drbdadm up all

do a cat /proc/drbd on server01 and server02 and you will see that both server01 and server02 are in Secondary Mode and they are inconsistent. Thats pretty obvious, because we have not set up the nfs system yet and we have not defined which system is going to be the primary server. In my case its server01. So will give the following commands on server01

server01# drbdsetup /dev/drbd0 primary -o
server01# mkfs.ext3 /dev/drbd0
server01# mkdir /data
server01# mount -t ext3 /dev/drbd0 /data

By default, nfs server stores its mount state and locks under /var/lib/nfs and we want this to be retained in the event of a failover. So do this on server01

server01# mv /var/lib/nfs/ /data/
server01# ln -s /data/nfs/ /var/lib/nfs
server01# mkdir /data/export
server01# umount /data

server01# cat /proc/drbd <--- See it carefully and you will notice that server01 is the primary system now.

issue these commands on server02, to prepare it for take over in the event of a failure.

mkdir /data
rm -fr /var/lib/nfs
ln -s /data/nfs /var/lib/nfs

Hmm… thats pretty much it. You are done from the configuration perspective. Now go ahead and fire up your systems.

Step 5. Starting your system.

Start the necessary services on server01 and server02

/etc/init.d/drbd start
/etc/init.d/heartbeat start

After the above has successfully completed, you will notice that server01 will have an additional IP - 192.168.1.100 and /data is mounted. U can also check /proc/drbd file for real time status.

On server02 you shouldn't see 192.168.1.100 and /data.

tail -f /var/log/messages file and watch the fun.


Step 6. Testing it to make sure that it actually works.

The best way to test it, is to do it the real and hard way. Just power off server01 and see how server02 takes over. Watch the take over logs on server02 :)

How to Boot Debian 5.0 (Lenny) in text / command-prompt mode

By default, after the install of Debian 5, the system automatically boots up to graphical mode. But thats probably not a choice for someone who is a command-line fan. So here's an easy way to make sure that your system boots to command prompt the next time you start it.

1. Make sure you have the root password. If this is the first time since your system install and the install process didnt ask you for a root password, then your password is probably the same as the password of the user that you created at install time. Unlike Ubuntu, this user isnt added to sudoers, so you have to do a direct su to switch to root.

debian5> su -
Password:
debian5#


2. By default the system boots at run-level 2. So get into this run-level directory.

debian# cd /etc/rc2.d

3. Tell the start up script that starts your Gnome Manager to not to start it, by simply renaming it. Its simple..

debian# mv S30gdm K30gdm

voila.. you are all done..