Tag Archives: wordpress

Five years of rackerhacker.com

Today marks the fifth year that this blog has existed on the internet. I bought the domain on February 14th, 2007 and tossed together a quick WordPress installation (I can’t even remember the version now!) to hold my notes that I was gathering at work.

Birthday Cake

Photo credit: Will Clayton

At the time, I had recently parted ways with a very small internet startup and joined the ranks at Rackspace as an entry-level Linux system administrator. The abrupt change from “top dog at the startup” to “wow, I don’t know anything about Linux” caught me by surprise and I was trying to stuff as much knowledge into my brain as quickly as I could. My teammates at Rackspace were eager to show me the ropes of wrangling servers and supporting customers.

As I mentioned already, the blog started out just as a place to stuff my notes from the things I learned at work. I figured that it would be nice to store it in a searchable format but it would also be great if I could link other people to certain posts if they needed more information to fix a problem. It was a way to retain knowledge but yet give it back to the people around me who needed it.

The blog has hit 456 posts (this one is #457) and it’s gone from a few page views per day to just over 20,000 per day. Here are the top five most accessed posts (since I’ve been keeping stats):

  1. Syncing an iPhone with a new Mac without hassles
  2. ip_conntrack: table full, dropping packet
  3. Delete a single iptables rule
  4. Increase MySQL connection limit
  5. MySQL Error 1040: too many connections

I’d like to send out a big thanks to the people who read this blog, add comments (or complaints!), and suggest new topics. You are the reason why I take the time to keep this blog going.

Tagged , , , , ,

One month with GlusterFS in production

As many of you might have noticed from my previous GlusterFS blog post and my various tweets, I’ve been working with GlusterFS in production for my personal hosting needs for just over a month. I’ve also been learning quite a bit from some of the folks in the #gluster channel on Freenode. On a few occasions I’ve even been able to help out with some configuration problems from other users.

There has been quite a bit of interest in GlusterFS as of late and I’ve been inundated with questions from coworkers, other system administrators and developers. Most folks want to know about its reliability and performance in demanding production environments. I’ll try to do my best to cover the big points in this post.

First off, here’s now I’m using it in production: I have two web nodes that keep content in sync for various web sites. They each run a GlusterFS server instance and they also mount their GlusterFS share. I’m using the replicate translator to keep both web nodes in sync with client side replication.

Here are my impressions after a month:

I/O speed is often tied heavily to network throughput
This one may seem obvious, but it’s not always true in all environments. If you deal with a lot of small files like I do, a 40mbit/sec link between the Xen guests is plenty. Adding extra throughput didn’t add any performance to my servers. However, if you wrangle large files on your servers regularly, you may want to consider higher throughput links between your servers. I was able to push just under 900mbit/sec by using dd to create a large file within a GlusterFS mount.

Network and I/O latency are big factors for small file performance
If you have a busy network and the latency creeps up from time to time, you’ll find that your small file performance will drop significantly (especially with the replicate translator). Without getting too nerdy (you’re welcome to read the technical document on replication), replication is an intensive process. When a file is accessed, the client goes around to each server node to ensure that it not only has a copy of the file being read, but that it has the correct copy. If a server didn’t save a copy of a file (due to disk failure or the server being offline when the file was written), it has to be synced across the network from one of the good nodes.

When you write files on replicated servers, the client has to roll through the same process first. Once that’s done, it has to lock the file, write to the change log, then do the write operation, drop the change log entries, and then unlock the file. All of those operations must be done on all of the servers. High latency networks will wreak havoc on this process and cause it to take longer than it should.

It’s quite obvious that if you have a fast, low-latency network between your servers, slow disks can still be a problem. If the client is waiting on the server nodes’ disks to write data, the read and write performance will suffer. I’ve tested this in environments with fast networks and very busy RAID arrays. Even if the network was very underutilized, slow disks could cut performance drastically.

Monitoring GlusterFS isn’t easy
When the client has communication problems with the server nodes, some weird things can happen. I’ve seen situations where the client loses connections to the servers (see the next section on reliability) and the client mount simply hangs. In other situations, the client has been knocked offline entirely and the process is missing from the process tree by the time I logged in. Your monitoring will need to ensure that the mount is active and is responding in a timely fashion.

There’s a handy script which allows you to monitor GlusterFS mounts via nagios that Ian Rogers put together. Also, you can get some historical data with acrollet’s munin-glusterfs plugin.

GlusterFS 3.x is pretty reliable
When I first started working with GlusterFS, I was using a version from the 2.x tree. The Fedora package maintainer hadn’t updated the package in quite some time, but I figured it should work well enough for my needs. I found that the small file performance was lacking and the nodes often had communication issues when many files were being accessed or written simultaneously. This improved when I built my own RPMs of 3.0.4 (and later 3.0.5) and began using those instead.

I did some failure testing by hard cycling the server and client nodes and found some interesting results. First off, abruptly pulling clients had no effects on the other clients or the server nodes. The connection eventually timed out and the servers logged the timeout as expected.

Abruptly pulling servers led to some mixed results. In the 2.x branch, I saw client hangs and timeouts when I abruptly removed a server. This appears to be mostly corrected in the 3.x branch. If you’re using replicate, it’s important to keep in mind that the first server volume listed in your client’s volume file is the one that will be coordinating the file and directory locking. Should that one fall offline quickly, you’ll see a hiccup in performance for a brief moment and the next server will be used for coordinating the locking. When your original server comes back up, the locking coordination will shift back.

Conclusion
I’m really impressed with how much GlusterFS can do with the simplicity of how it operates. Sure, you can get better performance and more features (sometimes) from something like Lustre or GFS2, but the amount of work required to stand up that kind of cluster isn’t trivial. GlusterFS really only requires that your kernel have FUSE support (it’s been in mainline kernels since 2.6.14).

There are some things that GlusterFS really needs in order to succeed:

  • Documentation – The current documentation is often out of date and confusing. I’ve even found instances where the documentation contradicts itself. While there are some good technical documents about the design of some translators, they really ought to do some more work there.
  • Statistics gathering – It’s very difficult to find out what GlusterFS is doing and where it can be optimized. Profiling your environment to find your bottlenecks is nearly impossible with the 2.x and 3.x branches. It doesn’t make it easier when some of the performance translators actually decrease performance.
  • Community involvement – This ties back into the documentation part a little, but it would be nice to see more participation from Gluster employees on IRC and via the mailing lists. They’re a little better with mailing list responses than other companies I’ve seen, but there is still room for improvement.

If you’re considering GlusterFS for your servers but you still have more questions, feel free to leave a comment or find me on Freenode (I’m ‘rackerhacker’).

Tagged , , , , ,

WordPress + W3 Total Cache + MaxCDN How-To

It’s no secret that I’m a big fan of WordPress as a blog and CMS platform. While it does have its problems, it’s relatively simple to set up, it’s extensible, and — when properly configured — it has great performance. The WP Super Cache plugin has been a staple on my WordPress blogs for quite some time and it has solved almost all of my performance problems.

However, when you load up quite a few plugins or a heavy theme, the performance will dip due to the increased number of stylesheets, javascript files, and images. You can compress and combine the stylesheets and javascript to decrease load times, but this may not get the performance to a level you like.

I was in this situation and I found a great solution: the W3 Total Cache plugin and the MaxCDN service.

To get started, visit MaxCDN’s site and set up an account. Their current promotion gives you 1TB of CDN bandwidth for one year for $10 (regularly $99). Once you sign up, do the following:

  • Click Manage Zones
  • Click Create pull zone

At this point, you’ll see a list of form fields to complete:

  • Enter an alias for the pull zone name
  • The origin server URL is the URL that’s normally used to access your site (i.e. rackerhacker.com)
  • The custom CDN domain is the URL you want to use for your CDN (i.e. cdn.rackerhacker.com)
  • The label can be anything you’d like to use to remember which zone is which
  • Enabling compression is generally a good idea

Once you save the zone, MaxCDN will give you a new domain name. You’ll want to create a CNAME record that points from your CDN URL (for me, that’s cdn.rackerhacker.com) to the really long URL that MaxCDN provides.

STOP HERE: Ensure that all of your DNS servers are replying with the CNAME record before you continue with the W3 Total Cache installation and CDN setup. If you proceed without waiting for that, some of your blog’s visitors will get errors when they try to load content via your CDN domain.

You’re ready for W3 Total Cache now. Install the plugin within your WordPress installation and activate it. Hop into the settings for the plugin and make these adjustments:

  • Enable Page Caching and set it to Disk (enhanced)
  • Enable Minify and set it to Disk
  • Enable Database Caching and set it to Disk
  • Leave the CDN disabled for now, but flip the CDN Type to Origin Pull (Mirror)
  • Press Save changes

Click CDN Settings at the top of the page and configure the CDN:

  • Enter your CDN domain (for me, it’s cdn.rackerhacker.com) in the top form field
  • Leave the other options as they are by default and click Save changes

W3 Total Cache should prompt you to clear out your page cache, and that would be recommended at this step. If you fully reload your blog’s main page in your browser (may require you to hold SHIFT while you click reload/refresh) and check the page source, you should see your CDN URL appear for some of the javascript or CSS files.

You may discover that some CSS files, stylesheets, or images aren’t being loaded via the CDN automatically. Luckily, that’s an easy fix. Under the Minify Settings section of the W3 Total Cache plugin settings, scroll to the very bottom. Add in your javascript or CSS files via the form fields at the bottom and the plugin should handle the minifying (is that even a word?) and the CDN URL rewriting for you.

Further reading:

Tagged , , , , ,

WordPress and PHP 5.3.x: update_comment_type_cache() expected to be a reference

I upgraded a Fedora 11 instance to Fedora 12 and found the following error at the top of one of my WordPress blogs:

Parameter 1 to update_comment_type_cache() expected to be a reference, 
value given in wp-includes/plugin.php on line 166

The problem wasn’t in a plugin, actually. It was within my theme’s (R755-light) functions.php:

function update_comment_type_cache(&$queried_posts) {

The temporary fix is to remove the & from that line so it looks like this:

function update_comment_type_cache($queried_posts) {

After clearing out the WP Super Cache, the page was loading properly again. It turns out that the function actually calculates how many comments are available for a given post, so that functionality is working properly right now. A few theme authors are already releasing new versions to fix this bug, but my theme’s author has not.

The credit for the fix goes to someone in the WordPress forums.

Tagged ,

Upgraded to WordPress 2.9

If you haven’t upgraded your WordPress installation to version 2.9 yet, you might want to consider doing that soon. There are quite a few improvements, bug fixes and security features available in the new version.

The automatic upgrade via the admin interface actually worked just fine for me. Of course, I backed up my database and files first, just to be sure.

Tagged ,