GlusterFS on the cheap with Rackspace’s Cloud Servers or Slicehost
NOTE: This post is out of date and is relevant only for GlusterFS 2.x.
- *High availability is certainly not a new concept, but if there’s one thing that frustrates me with high availability VM setups, it’s storage. If you don’t mind going active-passive, you can set up
DRBD, toss your favorite filesystem on it, and you’re all set.
If you want to go active-active, or if you want multiple nodes active at the same time, you need to use a clustered filesystem like GFS2, OCFS2 or Lustre. These are certainly good options to consider but they’re not trivial to implement. They usually rely on additional systems and scripts to provide reliable fencing and STONITH capabilities.
What about the rest of us who want multiple active VM’s with simple replicated storage that doesn’t require any additional elaborate systems? This is where GlusterFS really shines. GlusterFS can ride on top of whichever filesystem you prefer, and that’s a huge win for those who want a simple solution. However, that means that it has to use fuse, and that will limit your performance.
Let’s get this thing started!
Consider a situation where you want to run a WordPress blog on two VM’s with load balancers out front. You’ll probably want to use GlusterFS’s replicated volume mode (RAID 1-ish) so that the same files are on both nodes all of the time. To get started, build two small Slicehost slices or Rackspace Cloud Servers. I’ll be using Fedora 13 in this example, but the instructions for other distributions should be very similar.
First things first — be sure to set a new root password and update all of the packages on the system. This should go without saying, but it’s important to remember. We can clear out the default iptables ruleset since we will make a customized set later:
GlusterFS communicates over the network, so we will want to ensure that traffic only moves over the private network between the instances. We will need to add the private IP’s and a special hostname for each instance to
/etc/hosts on both instances. I’ll call mine
You’re now ready to install the required packages on both instances:
Make the directories for the GlusterFS volumes on each instance:
We’re ready to make the configuration files for our storage volumes. Since we want the same files on each instance, we will use the
--raid 1 option. This only needs to be run on the first node:
Once that’s done, you’ll have four new files:
booster.fstab– you won’t need this file
gluster1-store1-export.vol– server-side configuration file for the first instance
gluster2-store1-export.vol– server-side configuration file for the second instance
store1-tcp.vol– client side configuration file for GlusterFS clients
gluster1-store1-export.vol file to
/etc/glusterfs/glusterfsd.vol on your first instance. Then, copy
/etc/glusterfs/glusterfsd.vol on your second instance. The
store1-tcp.vol should be copied to
/etc/glusterfs/glusterfs.vol on both instances.
At this point, you’re ready to start the GlusterFS servers on each instance:
You can now mount the GlusterFS volume on both instances:
You should now be able to see the new GlusterFS volume in both instances:
As a test, you can create a file on your first instance and verify that your second instance can read the data:
If you remove that file on your second instance, it should disappear from your first instance as well.
Obviously, this is a very simple and basic implementation of GlusterFS. You can increase performance by making dedicated VM’s just for serving data and you can adjust the default performance options when you mount a GlusterFS volume. Limiting access to the GlusterFS servers is also a good idea.
If you want to read more, I’d recommend reading the GlusterFS Technical FAQ and the GlusterFS User Guide.
Thank you for your e-mails! I’ll be expanding on this post later with some sample benchmarks and additional tips/tricks, so please stay tuned.
#command line #filesystem #glusterfs #high availability #rackspace #storage #sysadmin