Adam Niederer

Efficient Status Bars with Rust

AKA "How I sped up my desktop by 100x using Rust 🔥 🚀"

AKA "How I learned to stop context switching and love the (fork) bomb"

My Desktop

My desktop is pretty heavily customized. I don't run a desktop environment, so much of it is a mishmash of à la carte programs. My status bar, in particular, is a custom script. It gathers and displays information by interfacing with a lot of different components of my system, namely:

All of that information is dumped to stdout, and painted to the screen with lemonbar, which also handles mouse events.

With it, I can play/pause/skip music, check the weather, date, and time, connect to Bluetooth devices and wifi access points, check my memory and swap usage, adjust and mute my volume, and make sure my hardware is happy.

Making it Work with Bash

The plethora of pre-packaged programs for these interfaces persuaded me to produce this program with bash. Despite its linguistic quirkiness, bash remains a leading choice for throwing together scripts.

I'll spare you the details of the script, for it contains many hard-coded API keys and MAC addresses. Here's a representative sample of the code, though:

# Display hardware sensors
TEMP=$(sensors 2>/dev/null | grep -E "Package|Physical" | cut -d ' ' -f 5 | cut -c 2-7)
VCORE=$(sensors 2>/dev/null | grep "Vcore" | cut -d ' ' -f 11 | cut -c 2-5)

if [ -n "$TEMP" ]; then
    echo -ne "\uf2c8 $TEMP "
fi

if [ -n "$VCORE" ]; then
    echo -ne "âš¡ ${VCORE}V  "
fi

It's workable, if a little brittle. Because programs like sensors, free, and pamixer rarely receive significant updates, this script only broke once or twice over its years of use, and it was always a quick fix.

It had a more serious problem though, which could be illustrated by my system's idle CPU usage:

1  [|||                       7.3%]   Tasks: 75; 1 running
2  [|||                       6.0%]   Load average: 0.49 0.43 0.45
3  [||||                      9.9%]   Uptime: 04:57:47
4  [||||                     10.6%]

It's wildly inefficient. Retrieving my CPU temperature and voltage spawns and subsequently terminates 8 processes. The entire script creates and kills at least 200 processes per second. Yikes!

My desktop often runs on machines with finite energy, so I made efforts to optimize it. By decimating the update frequency and ensuring power-hungry programs like nmcli run less frequently, I was able to run it on my laptop without noticeable effect on battery life.

Getting it Right with Rust

Optimizing my status bar sat on my back burner for quite a while, as many of the interfaces used were only accessible with C. C APIs are Satan's curse upon man, so I held out for a better solution.

In the winter of 2017, I got really into Rust. Consequently, I noticed a Cambrian explosion of wrapper crates for things like pulseaudio, bluez, lm-sensors, and dbus. A few months later, Rust had native bindings to everything I needed. Work quickly began on a replacement.

350 lines of Rust code later, I had a working copy of my status bar. Rust's amazing and highly standardized documentation system allowed me to integrate tens of crates in an evening, while its explicit error handling ensured graceful removal of irrelevant features on sparsely-equipped machines.

let thermal_sensors = Sensors::new().into_iter()
    .filter(|c| c.prefix() == "coretemp")
    .flat_map(|c| {
        c.into_iter().flat_map(|feat| {
            feat.into_iter().filter(|sub| {
                sub.subfeature_type() == &SubfeatureType::SENSORS_SUBFEATURE_TEMP_INPUT
            }).collect::<Vec<_>>()
        }).collect::<Vec<_>>()
    }).collect::<Vec<_>>();

let voltage_sensors = Sensors::new().into_iter()
    .filter(|c| c.prefix() == "nct6776")
    .flat_map(|c| {
        c.into_iter().flat_map(|feat| {
            feat.into_iter().filter(|sub| {
                sub.subfeature_type() == &SubfeatureType::SENSORS_SUBFEATURE_IN_INPUT
            }).collect::<Vec<_>>()
        }).collect::<Vec<_>>()
    }).collect::<Vec<_>>();

// Snip!

loop {
    // Snip!
    if let Ok(thermals) = thermal_sensors.first()?.get_value() {
        print!("\u{f2c8} {}°C  ", thermals);
    }
    if let Ok(voltage) = voltage_sensors.first()?.get_value() {
        print!("âš¡ {:.2}V  ", voltage);
    }
}
Imagine how ugly that would be in C.

The difference was night and day.

1  [|                         0.7%]   Tasks: 82; 1 running
2  [                          0.0%]   Load average: 0.08 0.13 0.29
3  [                          0.0%]   Uptime: 02:29:22
4  [|                         0.3%]

Measuring Success

Let's take a quick look at both programs with perf stat to ensure our qualitative results have some empirical backing.

Here's the Bash script in power-saving mode:

  20175.925539      task-clock:u (msec)       #    0.244 CPUs utilized
             0      context-switches:u        #    0.000 K/sec
             0      cpu-migrations:u          #    0.000 K/sec
     1,566,202      page-faults:u             #    0.078 M/sec
29,216,545,237      cycles:u                  #    1.448 GHz
36,471,840,295      stalled-cycles-frontend:u #  124.83% frontend cycles idle
42,522,200,490      instructions:u            #    1.46  insn per cycle
                                              #    0.86  stalled cycles per insn
 8,975,885,418      branches:u                #  444.881 M/sec
   257,211,408      branch-misses:u           #    2.87% of all branches

  82.651256083 seconds time elapsed

  13.757756000 seconds user
   7.113046000 seconds sys

And the Rust script:

  261.227367      task-clock:u (msec)       #    0.003 CPUs utilized
           0      context-switches:u        #    0.000 K/sec
           0      cpu-migrations:u          #    0.000 K/sec
       1,005      page-faults:u             #    0.004 M/sec
 262,024,893      cycles:u                  #    1.003 GHz
 478,654,337      stalled-cycles-frontend:u #  182.68% frontend cycles idle
 186,572,959      instructions:u            #    0.71  insn per cycle
                                            #    2.57  stalled cycles per insn
  41,622,811      branches:u                #  159.336 M/sec
   2,119,788      branch-misses:u           #    5.09% of all branches

83.175023848 seconds time elapsed

 0.130905000 seconds user
 0.149056000 seconds sys

We can see the Bash script executing 200x as many instructions, and requiring around 100x as many cycles as the Rust script.

Those aren't the only metrics we can test, however. Both scripts are throttled by a call to sleep with a constant argument. This means scripts which take longer to run will produce fewer updates per second.

Both scripts print one line per update, so I was able to measure their relative update frequency by running both scripts for the same amount of time and measuring the length of their outputs.

313 bash-output
399 rust-output

The Rust script runs at a 27% higher frequency than the bash script, furthering its efficiency lead.

On Ecosystems

Although 102 speedups are more likely to draw clicks, my most poignant takeaway from this endeavor is related to Rust's ecosystem. Many crates with underpinning C libraries offer much more thoughtful APIs, and are more accessible to casual users.

I'm repeatedly delighted by each crate's continued maintenance and development - especially pulse-binding-rust, which has added a ton of features while profoundly transforming its API for the better. This code from February:

extern "C" fn pulse_cb(_: *mut ContextInternal, info: *const ServerInfoInternal, ret: *mut c_void) {
    if !info.is_null() && !ret.is_null() {
        unsafe {
            let name = CStr::from_ptr((*info).default_sink_name).to_owned().into_string().unwrap();
            *(ret as *mut String) = name;
        }
    }
}

let mut pulse_sink_name: String = String::new();
pulsectx.introspect().get_server_info(
    (pulse_cb, &mut pulse_sink_name as *mut _ as *mut c_void));

is equivalent to this code from August:

pulsectx.introspect().get_server_info(|info| {
    pulse_channel.sender.send(info.default_sink_name.clone().unwrap().into());
});

let pulse_sink_name = pulse_channel.receiver.recv().unwrap();
Yielding to PulseAudio is omitted from both samples.

I can only hope the velocity carried by Rust's ecosystem is kept strong.