major.io words of wisdom from a systems engineer

Get faster GitLab runners with a ramdisk

Jet ski moving fast

When you build tons of kernels every day like my team does, you look for speed improvements anywhere you can. Caching repositories, artifacts, and compiled objects makes kernel builds faster and it reduces infrastructure costs.

Need for speed

We use GitLab CI in plenty of places, and that means we have a lot of gitlab-runner configurations for OpenShift (using the kubernetes executor) and AWS (using the docker-machine executor). The runner’s built-in caching makes it easy to upload and download cached items from object storage repositories like Google Cloud Storage or Amazon S3.

However, there’s an often overlooked feature hiding in the configuration for the docker executor that provides a great performance boost: mounting tmpfs inside your container. Not familiar with tmpfs? Arch Linux has a great wiki page for tmpfs and James Coyle has a well-written blog post about what makes it unique from the older ramfs.

RAM is much faster than your average cloud provider’s block storage. It also has incredibly low latency relative to most storage media. There’s a great interactive latency page that allows you to use a slider to travel back in time to 1990 and compare all kinds of storage performance numbers. (It’s really fun! Go drag the slider and be amazed.)

Better yet, many cloud providers give you lots of RAM per CPU on their instances, so if your work isn’t terribly memory intensive, you can use a lot of this RAM for faster storage.

Enabling tmpfs in Docker containers

⚠️ Beware of the dangers of tmpfs before adjusting your runner configuration! See the warnings at the end of this post.

This configuration is buried in the middle of the docker executor documentation. You will need to add some extra configuration to your [runners.docker] section to make it work:

[runners.docker]
  [runners.docker.tmpfs]
      "/ramdisk" = "rw,noexec"

This configuration mounts a tmpfs volume underneath /ramdisk inside the container. By default, this directory will be mounted with noexec, but if you need to execute scripts from that directory, change noexec to exec:

[runners.docker]
  [runners.docker.tmpfs]
      "/ramdisk" = "rw,exec"

In our case, compiling kernels requires executing scripts, so we use exec for our tmpfs mounts.

You must be specific for exec! As an example, this tmpfs volume will be mounted with noexec since that is the default:

[runners.docker]
  [runners.docker.tmpfs]
      "/ramdisk" = "rw"

Extra speed

For even more speed, we moved the objects generated by ccache to the ramdisk. The seek times are much lower and this allows ccache to look for its cached objects much more quickly.

Git repositories are also great things to stash on tmpfs. Big kernel repositories are usually 1.5GB to 2GB in size with tons of files. Checkouts are really fast when they’re done in tmpfs.

Dangers are lurking

⚠️ As mentioned earlier, beware of the dangers of tmpfs.

  • All of the containers on the machine will share the same amount of RAM for their tmpfs volumes. Be sure to account for how much each container will use and how many containers could be present on the same machine.

  • Be aware of how much memory your tests will use when they run. In our case, kernel compiles can consume 2-4GB of RAM, depending on configuration, so we try our best to leave some memory free.

  • These volumes also have no limits on how much data can go into the volume. However, if you put too much data into the tmpfs volume and your system runs critically low on available RAM, you could see a huge drop in performance, system instability, or even a crash. 🔥

Photo credit

buildah error: vfs driver does not support overlay.mountopt options

Storage bins

Buildah and podman make a great pair for building, managing and running containers on a Linux system. You can even use them with GitLab CI with a few small adjustments, namely the switch from the overlayfs to vfs storage driver.

I have some regularly scheduled GitLab CI jobs that attempt to build fresh containers each morning and I use these to get the latest packages and find out early when something is broken in the build process. A failed build appeared in my inbox earlier this week with the following error:

+ buildah bud -f builds/builder-fedora30 -t builder-fedora30 .
vfs driver does not support overlay.mountopt options

My container build script is fairly basic, but it does include a change to use the vfs storage driver:

# Use vfs with buildah. Docker offers overlayfs as a default, but buildah
# cannot stack overlayfs on top of another overlayfs filesystem.
export STORAGE_DRIVER=vfs

The script doesn’t change any mount options during the build process. A quick glance at the /etc/containers/storage.conf revealed a possible problem:

[storage.options]
# Storage options to be passed to underlying storage drivers

# mountopt specifies comma separated list of extra mount options
mountopt = "nodev,metacopy=on"

These mount options make sense when used with an overlayfs filesystem, but they are not used with vfs. I commented out the mountopt option, saved the file, and ran a test build locally. Success!

Fixing the build script involved a small change to the storage.conf just before building the container:

# Use vfs with buildah. Docker offers overlayfs as a default, but buildah
# cannot stack overlayfs on top of another overlayfs filesystem.
export STORAGE_DRIVER=vfs

# Newer versions of podman/buildah try to set overlayfs mount options when
# using the vfs driver, and this causes errors.
sed -i '/^mountopt =.*/d' /etc/containers/storage.conf

My containers are happily building again in GitLab.

Fedora 30 on Google Compute Engine

Google building

Fedora 30 is my primary operating system for desktops and servers, so I usually try to take it everywhere I go. I was recently doing some benchmarking for kernel compiles on different cloud plaforms and I noticed that Fedora isn’t included in Google Compute Engine’s default list of operating system images.

(Note: Fedora does include links to quick start an Amazon EC2 instance with their pre-built AMI’s. They are superb!)

First try

Fedora does offer cloud images in raw and qcow2 formats, so I decided to give that a try. Start by downloading the image, decompressing it, and then repackaging the image into a tarball.

$ wget http://mirrors.kernel.org/fedora/releases/30/Cloud/x86_64/images/Fedora-Cloud-Base-30-1.2.x86_64.raw.xz
$ xz -d Fedora-Cloud-Base-30-1.2.x86_64.raw.xz
$ mv Fedora-Cloud-Base-30-1.2.x86_64.raw disk.raw
$ tar cvzf fedora-30-google-cloud.tar.gz disk.raw

Once that’s done, create a bucket on Google storage and upload the tarball.

$ gsutil mb gs://fedora-cloud-base-30-image
$ gsutil cp fedora-30-google-cloud.tar.gz gs://fedora-cloud-image-30/

Uploading 300MB on my 10mbit/sec uplink was a slow process. When that’s done, tell Google Compute Engine that we want a new image made from this raw disk we uploaded:

$ gcloud compute images create --source-uri \
    gs://fedora-cloud-image-30/fedora-30-google-cloud.tar.gz \
    fedora-30-google-cloud

After a few minutes, a new custom image called fedora-30-google-cloud will appear in the list of images in Google Compute Engine.

$ gcloud compute images list | grep -i fedora
fedora-30-google-cloud   major-hayden-20150520    PENDING
$ gcloud compute images list | grep -i fedora
fedora-30-google-cloud   major-hayden-20150520    PENDING
$ gcloud compute images list | grep -i fedora
fedora-30-google-cloud   major-hayden-20150520    READY

I opened a browser, ventured to the Google Compute Engine console, and built a new VM with my image.

Problems abound

However, there are problems when the instance starts up. The serial console has plenty of errors:

DataSourceGCE.py[WARNING]: address "http://metadata.google.internal/computeMetadata/v1/" is not resolvable

Obviously something is wrong with DNS. It’s apparent that cloud-init is stuck in a bad loop:

url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [87/120s]: bad status code [404]
url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [93/120s]: bad status code [404]
url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [99/120s]: bad status code [404]
url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [105/120s]: bad status code [404]
url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [112/120s]: bad status code [404]
url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [119/120s]: unexpected error [Attempted to set connect timeout to 0.0, but the timeout cannot be set to a value less than or equal to 0.]
DataSourceEc2.py[CRITICAL]: Giving up on md from ['http://169.254.169.254/2009-04-04/meta-data/instance-id'] after 126 seconds

Those are EC2-type metadata queries and they won’t work here. The instance also has no idea how to set up networking:

Cloud-init v. 17.1 running 'init' at Wed, 07 Aug 2019 18:27:07 +0000. Up 17.50 seconds.
ci-info: +++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++
ci-info: +--------+-------+-----------+-----------+-------+-------------------+
ci-info: | Device |   Up  |  Address  |    Mask   | Scope |     Hw-Address    |
ci-info: +--------+-------+-----------+-----------+-------+-------------------+
ci-info: | eth0:  | False |     .     |     .     |   .   | 42:01:0a:f0:00:5f |
ci-info: |  lo:   |  True | 127.0.0.1 | 255.0.0.0 |   .   |         .         |
ci-info: |  lo:   |  True |     .     |     .     |   d   |         .         |
ci-info: +--------+-------+-----------+-----------+-------+-------------------+

This image is set up well for Amazon, but it needs some work to work at Google.

Fixing up the image

Go back to the disk.raw that we made in the first step of the blog post. We need to mount that disk, mount some additional filesystems, and chroot into the Fedora 30 installation on the raw disk.

Start by making a loop device for the raw disk and enumerating its partitions:

$ sudo losetup  /dev/loop0 disk.raw
$ kpartx -a /dev/loop0

Make a mountpoint and mount the first partition on that mountpoint:

$ sudo mkdir /mnt/disk
$ sudo mount /dev/mapper/loop0p1 /mnt/disk

We need some extra filesystems mounted before we can run certain commands in the chroot:

$ sudo mount --bind /dev /mnt/disk/dev
$ sudo mount --bind /sys /mnt/disk/sys
$ sudo mount --bind /proc /mnt/disk/proc

Now we can hop into the chroot:

$ sudo chroot /mnt/disk

From inside the chroot, remove cloud-init and install google-compute-engine-tools to help with Google cloud:

$ dnf -y remove cloud-init
$ dnf -y install google-compute-engine-tools
$ dnf clean all

The google-compute-engine-tools package has lots of services that help with running on Google cloud. We need to enable each one to run at boot time:

$ systemctl enable google-accounts-daemon google-clock-skew-daemon \
    google-instance-setup google-network-daemon \
    google-shutdown-scripts google-startup-scripts

To learn more about these daemons and what they do, head on over to the GitHub page for the package.

Exit the chroot and get back to your main system. Now that we have this image just like we want it, it’s time to unmount the image and send it to the cloud:

$ sudo umount /mnt/disk/dev /mnt/disk/sys /mnt/disk/proc
$ sudo umount /mnt/disk
$ sudo losetup -d /dev/loop0
$ tar cvzf fedora-30-google-cloud-fixed.tar.gz disk.raw
$ gsutil cp fedora-30-google-cloud-fixed.tar.gz gs://fedora-cloud-image-30/
$ gcloud compute images create --source-uri \
    gs://fedora-cloud-image-30/fedora-30-google-cloud-fixed.tar.gz \
    fedora-30-google-cloud-fixed

Start a new instance with this fixed image and watch it boot in the serial console:

[   10.379253] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 10737418240 ms ovfl timer
[   10.381350] RAPL PMU: hw unit of domain pp0-core 2^-0 Joules
[   10.382487] RAPL PMU: hw unit of domain package 2^-0 Joules
[   10.383415] RAPL PMU: hw unit of domain dram 2^-16 Joules
[   10.503233] EDAC sbridge:  Ver: 1.1.2


Fedora 30 (Cloud Edition)
Kernel 5.1.20-300.fc30.x86_64 on an x86_64 (ttyS0)

instance-2 login:

Yes! A ten second boot with networking is exactly what I needed.

Texas Linux Fest 2019 Recap

Las Colinas in Irving

Another Texas Linux Fest has come and gone! The 2019 Texas Linux Fest was held in Irving at the Irving Convention Center. It was a great venue surrounded by lots of shops and restaurants.

If you haven’t attended one of these events before, you really should! Attendees have varying levels of experience with Linux and the conference organizers (volunteers) work really hard to ensure everyone feels included.

The event usually falls on a Friday and Saturday. Fridays consist of longer, deeper dive talks on various topics – technical and non-technical. Saturdays are more of a typical conference format with a keynote in the morning and 45-minute talks through the day. Saturday nights have lightning talks as well as “Birds of a Feather” events for people with similar interests.

Highlights

Steve Ovens took us on a three hour journey on Friday to learn more about our self-worth. His talk, “You’re Worth More Than You Know, Matching your Skills to Employers”, covered a myriad of concepts such as discovering what really motivates you, understanding how to value yourself (and your skills), and how to work well with different personality types.

I’ve attended these types of talks before and they sometimes end up a bit fluffy without items that you can begin using quickly. Steve’s talk was the opposite. He gave us concrete ways to change how we think about ourselves and use that knowledge to advance ourselves at work. I learned a lot about negotiation strategies for salary when getting hired or when pushing for a raise. Steve stopped lots of times to answer questions and it was clear that he was really interested in this topic.

Thomas Cameron kicked off Saturday with his “Linux State of the Union” talk. He talked a lot about his personal journey and how he has changed along the way. He noted quite a few changes to Linux (not the code, but the people around it) that many of us had not noticed. We learned more about how we can make the Linux community more diverse, inclusive, and welcoming. We also groaned through some problems from the good old days with jumpers on SATA cards and the joys of winmodems.

Adam Miller threw us into a seat of a roller coaster and gave a whirlwind talk about all the ways you can automate (nearly) everything with Ansible.

Adam Miller Ansible talk

He covered everything from simple configuration management tasks to scaling up software deployments over thousands of nodes. Adam also explained the OCI image format as being “sweet sweet tarballs with a little bit of metadata” and the audience was rolling with laughter. Adam’s talks are always good and you’ll be energized all the way through.

José Miguel Parrella led a great lightning talk in the evening about how Microsoft uses Linux in plenty of places:

Debian at Microsoft slide

The audience was shocked by how much Debian was used at Microsoft and it made it more clear that Microsoft is really making a big shift towards open source well. Many of us knew that already but we didn’t know the extent of the work being done.

My talks

My first talk was about my team at Red Hat, the Continuous Kernel Integration team. I shared some of the challenges involved with doing CI for the kernel at scale and how difficult it is to increase test coverage of subsystems within the kernel. There were two kernel developers in the audience and they had some really good questions.

The discussion at the end was quite productive. The audience had plenty of questions about how different pieces of the system worked, and how well GitLab was working for us. We also talked a bit about how the kernel is developed and if there is room for improvement. One attendee hoped that some of the work we’re doing will change the kernel development process for the better. I hope so, too.

My second talk covered the topic of burnout. I have delivered plenty of talks about impostor syndrome in the past and I was eager to share more ideas around “soft” skills that become more important to technical career development over time.

The best part of these types of talks for me is the honesty that people bring when they share their thoughts after the talk. A few people from the audience shared their own personal experiences (some were very personal) and you could see people in the audience begin to understand how difficult burnout recovery can be. Small conferences like these create environments where people can talk honestly about difficult topics.

If you’re looking for the slides from these talks, you can view them in Google Slides (for the sake of the GIFs!):

Google Slides also allows you to download the slides as PDFs. Just choose File > Download as > PDF.

BoF: Ham Radio and OSS

The BoFs were fairly late in the day and everyone was looking tired. However, we had a great group assemble for the Ham Radio and OSS BoF. We had about 15-20 licensed hams and 5-6 people who were curious about the hobby.

We talked about radios, antennas, procedures, how to study, and the exams. The ham-curious folks who joined us looked a bit overwhelmed by the help they were getting, but they left the room with plenty of ideas on how to get started.

I also agreed to write a blog post about everything I’ve learned so far that has made the hobby easier for me and I hope to write that soon. There is so much information out there for studying and finding equipment that it can become really confusing for people new to the hobby.

Final thoughts

If you get the opportunity to attend a local Linux fest in your state, do it! The Texas one is always good and people joined us from Arkansas, Oklahoma, Louisiana, and Arizona. Some people came as far as Connecticut and the United Kingdom! These smaller events have a much higher signal to noise ratio and there is more real discussion rather than marketing from industry giants.

Thanks to everyone who put the Texas Linux Fest together this year!

Build containers in GitLab CI with buildah

cranes and skyscrapers

My team at Red Hat depends heavily on GitLab CI and we build containers often to run all kinds of tests. Fortunately, GitLab offers up CI to build containers and a container registry in every repository to hold the containers we build.

This is really handy because it keeps everything together in one place: your container build scripts, your container build infrastructure, and the registry that holds your containers. Better yet, you can put multiple types of containers underneath a single git repository if you need to build containers based on different Linux distributions.

Building with Docker in GitLab CI

By default, GitLab offers up a Docker builder that works just fine. The CI system clones your repository, builds your containers and pushes them wherever you want. There’s even a simple CI YAML file that does everything end-to-end for you.

However, I have two issues with the Docker builder:

  • Larger images: The Docker image layering is handy, but the images end up being a bit larger, especially if you don’t do a little cleanup in each stage.

  • Additional service: It requires an additional service inside the CI runner for the dind (“Docker in Docker”) builder. This has caused some CI delays for me several times.

Building with buildah in GitLab CI

On my local workstation, I use podman and buildah all the time to build, run, and test containers. These tools are handy because I don’t need to remember to start the Docker daemon each time I want to mess with a container. I also don’t need sudo.

All of my containers are stored beneath my home directory. That’s good for keeping disk space in check, but it’s especially helpful on shared servers since each user has their own unique storage. My container pulls and builds won’t disrupt anyone else’s work on the server and their work won’t disrupt mine.

Finally, buildah offers some nice options out of the box. First, when you build a container with buildah bud, you end up with only three layers by default:

  1. Original OS layer (example: fedora:30)
  2. Everything you added on top of the OS layer
  3. Tiny bit of metadata

This is incredibly helpful if you use package managers like dnf, apt, and yum that download a bunch of metadata before installing packages. You would normally have to clear the metadata carefully for the package manager so that your container wouldn’t grow in size. Buildah takes care of that by squashing all the stuff you add into one layer.

Of course, if you want to be more aggressive, buildah offers the --squash option which squashes the whole image down into one layer. This can be helpful if disk space is at a premium and you change the layers often.

Getting started

I have a repository called os-containers in GitLab that maintains fully updated containers for Fedora 29 and 30. The .gitlab-ci.yml file calls build.sh for two containers: fedora29 and fedora30. Open the build.sh file and follow along here:

# Use vfs with buildah. Docker offers overlayfs as a default, but buildah
# cannot stack overlayfs on top of another overlayfs filesystem.
export STORAGE_DRIVER=vfs

First off, we need to tell buildah to use the vfs storage driver. Docker uses overlayfs by default and stacking overlay filesystems will definitely lead to problems. Buildah won’t let you try it.

# Write all image metadata in the docker format, not the standard OCI format.
# Newer versions of docker can handle the OCI format, but older versions, like
# the one shipped with Fedora 30, cannot handle the format.
export BUILDAH_FORMAT=docker

By default, buildah uses the oci container format. This sometimes causes issues with older versions of Docker that don’t understand how to parse that type of metadata. By setting the format to docker, we’re using a format that almost all container runtimes can understand.

# Log into GitLab's container repository.
export REGISTRY_AUTH_FILE=${HOME}/auth.json
echo "$CI_REGISTRY_PASSWORD" | buildah login -u "$CI_REGISTRY_USER" --password-stdin $CI_REGISTRY

Here we set a path for the auth.json that contains the credentials for talking to the container repository. We also use buildah to authenticate to GitLab’s built-in container repository. GitLab automatically exports these variables for us (and hides them in the job output), so we can use them here.

buildah bud -f builds/${IMAGE_NAME} -t ${IMAGE_NAME} .

We’re now building the container and storing it temporarily as the bare image name, such as fedora30. This is roughly equivalent to docker build.

CONTAINER_ID=$(buildah from ${IMAGE_NAME})
buildah commit --squash $CONTAINER_ID $FQ_IMAGE_NAME

Now we are making a reference to our container with buildah from and using that reference to squash that container down into a single layer. This keeps the container as small as possible.

The commit step also tags the resulting image using our fully qualified image name (in this case, it’s registry.gitlab.com/majorhayden/os-containers/fedora30:latest)

buildah push ${FQ_IMAGE_NAME}

This is the same as docker push. There’s not much special to see here.

Maintaining containers

GitLab allows you to take things to the next level with CI schedules. In my repository, there is a schedule to build my containers once a day to catch the latest updates. I use these containers a lot and they need to be up to date before I can run tests.

If the container build fails for some reason, GitLab will send me an email to let me know.

Photo Source