Automated testing for Ansible CIS playbook on RHEL/CentOS 6

Ansible logoI started working on the Ansible CIS playbook for CentOS and RHEL 6 back in 2014 and I’ve made a few changes to increase quality and make it easier to use.

First off, the role itself is no longer a submodule. You can now just clone the repository and get rolling. This should reduce the time it takes to get started.

Also, all pull requests to the repository now go through integration testing at Rackspace. Each pull request goes through the gauntlet:

  • Syntax check on Travis-CI
  • Travis-CI builds a server at Rackspace
  • The entire Ansible playbook runs on the Rackspace Cloud Server
  • Results are sent back to GitHub

The testing process usually takes under five minutes.

Stay tuned: Updates are coming for RHEL and CentOS 7. ;)

Live migration failures with KVM and libvirt

I decided to change some of my infrastructure back to KVM again, and the overall experience has been quite good in Fedora 22. Using libvirt with KVM is a breeze and the virt-manager tools make it even easier. However, I ran into some problems while trying to migrate virtual machines from one server to another.

The error

# virsh migrate --live --copy-storage-all bastion qemu+ssh://root@192.168.250.33/system
error: internal error: unable to execute QEMU command 'drive-mirror': Failed to connect socket: Connection timed out

That error message wasn’t terribly helpful. I started running through my usual list of checks:

  • Can the hypervisors talk to each other? Yes, iptables is disabled.
  • Are ssh keys configured? Yes, verified.
  • What about ssh host keys being accepted on each side? Both sides can ssh without interaction.
  • SELinux? No AVC’s logged.
  • Libvirt logs? Nothing relevant in libvirt’s qemu logs.
  • Filesystem permissions for libvirt’s directories? Identical on both sides.
  • Libvirt daemon running on both sides? Yes.

I was pretty confused at this point. A quick Google search didn’t reveal too many relevant issues, but I did find a Red Hat Bug from 2013 that affected RHEL 7. The issue in the bug was that libvirt wasn’t using the right ports to talk between servers and those packets were being dropped by iptables. My iptables rules were empty.

Debug time

I ran the same command with LIBVIRT_DEBUG=1 at the front:

# LIBVIRT_DEBUG=1 virsh migrate --live --copy-storage-all bastion qemu+ssh://root@192.168.250.33/system 2>&1 > debug.log

After scouring the pages and pages of output, I couldn’t find anything useful.

Eureka!

I spotted an error message briefly in virt-manager or the debug logs that jogged my brain to think about a potential problem: hostnames. Both hosts had a fairly bare /etc/hosts file without IP/hostname pairs for each hypervisor. After editing both servers’ /etc/hosts file to include the short and full hostnames for each hypervisor, I tested the live migration one more time.

Success!

The migration went off without a hitch in virt-manager and via the virsh client. I migrated several VM’s, including the one running this site, with no noticeable interruption.

Very slow ssh logins on Fedora 22

I’ve recently set up a Fedora 22 firewall/router at home (more on that later) and I noticed that remote ssh logins were extremely slow. In addition, sudo commands seemed to stall out for the same amount of time (about 25-30 seconds).

I’ve done all the basic troubleshooting already:

  • Switch to UseDNS no in /etc/ssh/sshd_config
  • Set GSSAPIAuthentication no in /etc/ssh/sshd_config
  • Tested DNS resolution

These lines kept cropping up in my system journal when I tried to access the server using ssh:

dbus[4865]: [system] Failed to activate service 'org.freedesktop.login1': timed out
sshd[7391]: pam_systemd(sshd:session): Failed to create session: Activation of org.freedesktop.login1 timed out
sshd[7388]: pam_systemd(sshd:session): Failed to create session: Activation of org.freedesktop.login1 timed out

The process list on the server looked fine. I could see dbus-daemon and systemd-logind processes and they were in good states. However, it looked like dbus-daemon had restarted at some point and systemd-logind had not been restarted since then. I crossed my fingers and bounced systemd-logind:

systemctl restart systemd-logind

Success! Logins via ssh and escalations with sudo worked instantly.

Restoring wireless and Bluetooth state after reboot in Fedora 22

Thinkpad X1 Carbon 3rd genMy upgrade to Fedora 22 on the ThinkPad X1 Carbon was fairly uneventful and the hiccups were minor. One of the more annoying items that I’ve been struggling with for quite some time is how to boot up with the wireless LAN and Bluetooth disabled by default. Restoring wireless and Bluetooth state between reboots is normally handled quite well in Fedora.

In Fedora 21, NetworkManager saved my settings between reboots. For example, if I shut down with wifi off and Bluetooth on, the laptop would boot up later with wifi off and Bluetooth on. This wasn’t working well in Fedora 22: both the wifi and Bluetooth were always enabled by default.

Digging into rfkill

I remembered rfkill and began testing out some commands. It detected that I had disabled both devices via NetworkManager (soft):

$ rfkill list
0: tpacpi_bluetooth_sw: Bluetooth
    Soft blocked: yes
    Hard blocked: no
2: phy0: Wireless LAN
    Soft blocked: yes
    Hard blocked: no

It looked like systemd has some hooks already configured to manage rfkill via the systemd-rfkill service. However, something strange happened when I tried to start the service:

# systemctl start systemd-rfkill@0
Failed to start systemd-rfkill@0.service: Unit systemd-rfkill@0.service is masked.

Well, that’s certainly weird. While looking into why it’s masked, I found an empty file in /etc/systemd:

# ls -al /etc/systemd/system/systemd-rfkill@.service 
-rwxr-xr-x. 1 root root 0 May 11 16:36 /etc/systemd/system/systemd-rfkill@.service

I don’t remember making that file. Did something else put it there?

# rpm -qf /etc/systemd/system/systemd-rfkill@.service
tlp-0.7-4.fc22.noarch

Ah, tlp!

Configuring tlp

I looked in tlp’s configuration file in /etc/default/tlp and found a few helpful configuration items:

# Restore radio device state (Bluetooth, WiFi, WWAN) from previous shutdown
# on system startup: 0=disable, 1=enable.
# Hint: the parameters DEVICES_TO_DISABLE/ENABLE_ON_STARTUP/SHUTDOWN below
#   are ignored when this is enabled!
RESTORE_DEVICE_STATE_ON_STARTUP=0
 
# Radio devices to disable on startup: bluetooth, wifi, wwan.
# Separate multiple devices with spaces.
#DEVICES_TO_DISABLE_ON_STARTUP="bluetooth wifi wwan"
 
# Radio devices to enable on startup: bluetooth, wifi, wwan.
# Separate multiple devices with spaces.
#DEVICES_TO_ENABLE_ON_STARTUP="wifi"
 
# Radio devices to disable on shutdown: bluetooth, wifi, wwan
# (workaround for devices that are blocking shutdown).
#DEVICES_TO_DISABLE_ON_SHUTDOWN="bluetooth wifi wwan"
 
# Radio devices to enable on shutdown: bluetooth, wifi, wwan
# (to prevent other operating systems from missing radios).
#DEVICES_TO_ENABLE_ON_SHUTDOWN="wwan"
 
# Radio devices to enable on AC: bluetooth, wifi, wwan
#DEVICES_TO_ENABLE_ON_AC="bluetooth wifi wwan"
 
# Radio devices to disable on battery: bluetooth, wifi, wwan
#DEVICES_TO_DISABLE_ON_BAT="bluetooth wifi wwan"
 
# Radio devices to disable on battery when not in use (not connected):
# bluetooth, wifi, wwan
#DEVICES_TO_DISABLE_ON_BAT_NOT_IN_USE="bluetooth wifi wwan"

So tlp’s default configuration doesn’t restore device state and it masked systemd’s rfkill service. I adjusted one line in tlp’s configuration and rebooted:

DEVICES_TO_DISABLE_ON_STARTUP="bluetooth wifi wwan"

After the reboot, both the wifi and Bluetooth functionality were shut off! That’s exactly what I needed.

Extra credit

Thanks to a coworker, I was able to make a NetworkManager script to automatically shut off the wireless LAN whenever I connected to a network via ethernet. This is typically what I do when coming back from an in-person meeting to my desk (where I have ethernet connectivity).

If you want the same automation, just drop this script into /etc/NetworkManager/dispatcher.d/70-wifi-wired-exclusive.sh and make it executable:

#!/bin/bash
export LC_ALL=C
 
enable_disable_wifi ()
{
        result=$(nmcli dev | grep "ethernet" | grep -w "connected")
        if [ -n "$result" ]; then
                nmcli radio wifi off
        fi
}
 
if [ "$2" = "up" ]; then
        enable_disable_wifi
fi

Unplug the ethernet connection, start wifi, and then plug the ethernet connection back in. Once NetworkManager fully connects (DHCP lease obtained, connectivity check passes), the wireless LAN should shut off automatically.

Making things more super with supernova 2.0

OpenStackLogo supernovaI started supernova a little over three years ago with the idea of making it easier to use novaclient. Three years and a few downloads later, it manages multiple different OpenStack clients, like nova, glance, and trove along with some handy features for users who manage a large number of environments.

What’s new?

With some help from some friends who are much better at writing Python than I am (thanks Paul, Matt and Jason), I restructured supernova to make it more testable. The big, awkward SuperNova class was dropped and there are fewer circular imports. In addition, I migrated the cli management components to use the click module. It’s now compatible with Python versions 2.6, 2.7, 3.3 and 3.4.

The overall functionality hasn’t changed much, but there’s a new option to specify a custom supernova configuration that sits in a non-standard location or with a filename other than .supernova. Simply use the -c flag:

supernova -c ~/work/.supernova dfw list
supernova -c ~/personal/supernova-config-v1 staging list

The testing is done with Travis-CI and code coverage is checked with Codecov. Pull requests will automatically be checked with unit tests and I’ll do my best to urge committers to keep test coverage at 100%.

Updating supernova

Version 2.0.0 is already in PyPi, so an upgrade using pip is quite easy:

pip install -U supernova