Tap into your Linux system with SystemTap

One of the most interesting topics I’ve seen so far during my RHCA training at Rackspace this week is SystemTap. In short, SystemTap allows you to dig out a bunch of details about your running system relatively easily. It takes scripts, converts them to C, builds a kernel module, and then runs the code within your script.

HOLD IT: The steps below are definitely not meant for those who are new to Linux. Utilizing SystemTap on a production system is a bad idea — it can chew up significant resources while it runs and it can also cause a running system to kernel panic if you’re not careful with the packages you install.

These instructions will work well with Fedora, CentOS and Red Hat Enterprise Linux. Luckily, the SystemTap folks put together some instructions for Debian and Ubuntu as well.

Before you can start working with SystemTap on your RPM-based distribution, you’ll need to get some prerequisites together:

yum install gcc systemtap systemtap-runtime systemtap-testsuite kernel-devel
yum --enablerepo=*-debuginfo install kernel-debuginfo kernel-debuginfo-common

WHOA THERE: Ensure that the kernel-devel and kernel-debuginfo* packages that you install via yum match up with your running kernel. If there’s a newer kernel available from your yum repo, yum will pull that one. If it’s been a while since you updated, you’ll either need to upgrade your current kernel to the latest and reboot or you’ll need to hunt down the corresponding kernel-devel and kernel-debuginfo* packages from a repository. Installing the wrong package version can lead to kernel panics. Also, bear in mind that the debuginfo packages are quite large: almost 200MB in Red Hat/CentOS and almost 300MB in Fedora.

You can’t write the script in just any language. SystemTap uses an odd syntax to get things going:

#! /usr/bin/env stap
probe begin { println("hello world") exit () }

Just run the script with stap:

# stap -v helloworld.stp 
Pass 1: parsed user script and 73 library script(s) using 94380virt/21988res/2628shr kb, in 140usr/30sys/167real ms.
Pass 2: analyzed script: 1 probe(s), 1 function(s), 0 embed(s), 0 global(s) using 94776virt/22516res/2692shr kb, in 10usr/0sys/5real ms.
Pass 3: using cached /root/.systemtap/cache/bc/stap_bc368822da380b943d4e845ee15ed047_773.c
Pass 4: using cached /root/.systemtap/cache/bc/stap_bc368822da380b943d4e845ee15ed047_773.ko
Pass 5: starting run.
hello world
Pass 5: run completed in 0usr/20sys/285real ms.

The systemtap-testsuite package gives you a tubload of extremely handy SystemTap scripts. For example:

# cd /usr/share/systemtap/testsuite/systemtap.examples/io/
# stap iotime.stp
15138470 6351 (httpd) access /usr/share/cacti/index.php read: 0 write: 0
15142243 6351 (httpd) access /usr/share/cacti/include/auth.php read: 0 write: 0
15143780 6351 (httpd) access /usr/share/cacti/include/global.php read: 0 write: 0
15144099 6351 (httpd) access /etc/cacti/db.php read: 0 write: 0
15187641 6351 (httpd) access /usr/share/cacti/lib/adodb/adodb.inc.php read: 106486 write: 0
15187664 6351 (httpd) iotime /usr/share/cacti/lib/adodb/adodb.inc.php time: 218
15194965 6351 (httpd) access /usr/share/cacti/lib/adodb/adodb-time.inc.php read: 0 write: 0
15195692 6351 (httpd) access /usr/share/cacti/lib/adodb/adodb-iterator.inc.php read: 0 write: 0
   ... output continues ...

The iotime.stp script dumps out the reads and writes occurring on the system in real time. After starting the script above, I accessed my cacti instance on the server and immediately started seeing some reads as apache began picking up PHP files to parse.

Consider a situation in which you need to decrease interrupts on a Linux machine. This is vital for laptops and systems that need to remain in low power states. Some might suggest powertop for that, but why not give SystemTap a try?

# cd /usr/share/systemtap/testsuite/systemtap.examples/interrupt/
# stap interrupts-by-dev.stp 
        ohci_hcd:usb3 :      1
        ohci_hcd:usb4 :      1
            hda_intel :      1
                 eth0 :      2
                 eth0 :      2
                 eth0 :      2
                 eth0 :      2
                 eth0 :      2
                 eth0 :      2

On this particular system, it’s pretty obvious that the ethernet interface is causing a lot of interrupts.

If you want more examples, keep hunting around in the systemtap-testsuite package (remember rpm -ql systemtap-testsuite) or review the giant list of examples on SystemTap’s site.

Thanks again to Phil Hopkins at Rackspace for giving us a detailed explanation of system profiling during training.

Comments

  1. William Cohen says

    Those examples systemtap scripts are also in base systemtap rpm. You can see them in /usr/share/doc/systemtap-*/examples/

  2. Keith Chambers says

    I came across your blog last night while Googling for SystemTap implementation examples. Great blog, lots of good stuff on here!

    I think regarding the statement “Utilizing SystemTap on a production system is a bad idea — it can chew up significant resources while it runs…”

    I don’t believe this to be the case. The intent of SystemTap is to provide for deep kernel tracing _in_ production without a custom built kernel. I believe SystemTap achieves this.

    The performance penalty of SystemTap is generally 0-5%, which is a reasonable trade in many situations for high quality data. Sure you can write a script that will chew up more than that, etc, etc, etc — but running in production isn’t “a bad idea” in my opinion.

    Here’s a script I put together for low level Disk I/O monitoring. It’s great for a “cloud” environment where the underlying storage is shared between many VMs.

    #!/usr/bin/stap

    global latencyTimes, requestTime[10000]

    probe ioscheduler.elv_next_request {
    requestTime[$q] = gettimeofday_us()
    }

    probe ioscheduler.elv_completed_request {
    t = gettimeofday_us()
    s = requestTime[$q]
    delete requestTime[$q]
    if (s) {
    latencyTimes << (0)) {
    printf(“\n”)
    printf(“Sample period: 10 seconds\n”)
    printf(“\n”)
    printf(“Total IOs: %10d\n”, @count(latencyTimes))
    printf(“\n”)
    printf(“IO latency in microseconds (us):\n”)
    printf(“\n”)
    printf(“Min:%10d\n”, @min(latencyTimes))
    printf(“Max:%10d\n”, @max(latencyTimes))
    printf(“Avg:%10d\n”, @avg(latencyTimes))
    printf(“\n”)
    printf(“Histogram representation of IO latency in microseconds (us):\n”)
    printf(“\n”)
    println(@hist_log(latencyTimes))
    }
    else {
    printf(“No IOs issued in the last 10 seconds.\n”)
    printf(“\n”)
    }

    Here is a sample of the output: (It looks prettier in a terminal)

    [user@host ~]# ./latency.stp

    Sample period: 10 seconds

    Total IOs: 72274

    IO latency in microseconds (us):

    Min: 8
    Max: 25302
    Avg: 88

    Histogram representation of IO latency in microseconds (us):

    value |————————————————– count
    2 | 0
    4 | 0
    8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 59175
    16 |@@@ 3765
    32 | 936
    64 |@@ 2572
    128 |@ 1546
    256 |@ 1718
    512 |@ 1241
    1024 | 758
    2048 | 275
    4096 | 258
    8192 | 27
    16384 | 3
    32768 | 0
    65536 | 0

    Keep up the great work blogging! :-)

    Keith

  3. says

    Thanks again for linking me this on twitter,

    As a heads up I had to use:

    yum –enablerepo=debug install kernel-debuginfo kernel-debuginfo-common

    On a CentOS 5.5 x86_64 install

    Will keep trying to get SL6 packages and system tap working.

    Cheers

    Buzz

  4. says

    SL6 fix

    /etc/yum.repos.d/sl-debug.repo

    [sl-debug]
    name=Scientific Linux Debug
    baseurl=http://ftp1.scientificlinux.org/linux/scientific/6.0/archive/debuginfo/
    enabled=0
    gpgcheck=0
    gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-sl file:///etc/pki/rpm-gpg/RPM-GPG-KEY-dawson

    yum –enablerepo=*-debug install kernel-debuginfo kernel-debuginfo-common

    Thanks @fche on twitter for the baseurl I have these installing now on my sl6 box :)

Trackbacks

Leave a Reply

Your email address will not be published. Required fields are marked *