Efficient Status Bars with Rust
AKA "How I sped up my desktop by 100x using Rust 🔥 🚀"
AKA "How I learned to stop context switching and love the (fork) bomb"
My Desktop
My desktop is pretty heavily customized. I don't run a desktop environment, so much of it is a mishmash of à la carte programs. My status bar, in particular, is a custom script. It gathers and displays information by interfacing with a lot of different components of my system, namely:
- The system clock
- System usage information
- Hardware voltage and temperature sensors
- Any available Bluetooth interfaces
- Ethernet and WLAN interfaces
- PulseAudio sinks
- Internet weather APIs
- Battery sensors
- Music players
All of that information is dumped to stdout, and painted to the screen with lemonbar, which also handles mouse events.
With it, I can play/pause/skip music, check the weather, date, and time, connect to Bluetooth devices and wifi access points, check my memory and swap usage, adjust and mute my volume, and make sure my hardware is happy.
Making it Work with Bash
The plethora of pre-packaged programs for these interfaces persuaded me to produce this program with bash. Despite its linguistic quirkiness, bash remains a leading choice for throwing together scripts.
I'll spare you the details of the script, for it contains many hard-coded API keys and MAC addresses. Here's a representative sample of the code, though:
# Display hardware sensors
TEMP=$(sensors 2>/dev/null | grep -E "Package|Physical" | cut -d ' ' -f 5 | cut -c 2-7)
VCORE=$(sensors 2>/dev/null | grep "Vcore" | cut -d ' ' -f 11 | cut -c 2-5)
if [ -n "$TEMP" ]; then
echo -ne "\uf2c8 $TEMP "
fi
if [ -n "$VCORE" ]; then
echo -ne "âš¡ ${VCORE}V "
fi
It's workable, if a little brittle. Because programs like
sensors
, free
, and
pamixer
rarely receive significant updates, this
script only broke once or twice over its years of use, and it
was always a quick fix.
It had a more serious problem though, which could be illustrated by my system's idle CPU usage:
1 [||| 7.3%] Tasks: 75; 1 running
2 [||| 6.0%] Load average: 0.49 0.43 0.45
3 [|||| 9.9%] Uptime: 04:57:47
4 [|||| 10.6%]
It's wildly inefficient. Retrieving my CPU temperature and voltage spawns and subsequently terminates 8 processes. The entire script creates and kills at least 200 processes per second. Yikes!
My desktop often runs on machines with finite energy, so I
made efforts to optimize it. By decimating the update frequency
and ensuring power-hungry programs like nmcli
run
less frequently, I was able to run it on my laptop without
noticeable effect on battery life.
Getting it Right with Rust
Optimizing my status bar sat on my back burner for quite a while, as many of the interfaces used were only accessible with C. C APIs are Satan's curse upon man, so I held out for a better solution.
In the winter of 2017, I got really into Rust. Consequently,
I noticed a Cambrian explosion of wrapper crates for things like
pulseaudio
, bluez
,
lm-sensors
, and dbus
. A few months
later, Rust had native bindings to everything I needed. Work
quickly began on a replacement.
350 lines of Rust code later, I had a working copy of my status bar. Rust's amazing and highly standardized documentation system allowed me to integrate tens of crates in an evening, while its explicit error handling ensured graceful removal of irrelevant features on sparsely-equipped machines.
let thermal_sensors = Sensors::new().into_iter()
.filter(|c| c.prefix() == "coretemp")
.flat_map(|c| {
.into_iter().flat_map(|feat| {
c.into_iter().filter(|sub| {
feat.subfeature_type() == &SubfeatureType::SENSORS_SUBFEATURE_TEMP_INPUT
sub}).collect::<Vec<_>>()
}).collect::<Vec<_>>()
}).collect::<Vec<_>>();
let voltage_sensors = Sensors::new().into_iter()
.filter(|c| c.prefix() == "nct6776")
.flat_map(|c| {
.into_iter().flat_map(|feat| {
c.into_iter().filter(|sub| {
feat.subfeature_type() == &SubfeatureType::SENSORS_SUBFEATURE_IN_INPUT
sub}).collect::<Vec<_>>()
}).collect::<Vec<_>>()
}).collect::<Vec<_>>();
// Snip!
loop {
// Snip!
if let Ok(thermals) = thermal_sensors.first()?.get_value() {
print!("\u{f2c8} {}°C ", thermals);
}
if let Ok(voltage) = voltage_sensors.first()?.get_value() {
print!("âš¡ {:.2}V ", voltage);
}
}
The difference was night and day.
1 [| 0.7%] Tasks: 82; 1 running
2 [ 0.0%] Load average: 0.08 0.13 0.29
3 [ 0.0%] Uptime: 02:29:22
4 [| 0.3%]
Measuring Success
Let's take a quick look at both programs with
perf stat
to ensure our qualitative results have
some empirical backing.
Here's the Bash script in power-saving mode:
20175.925539 task-clock:u (msec) # 0.244 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
1,566,202 page-faults:u # 0.078 M/sec
29,216,545,237 cycles:u # 1.448 GHz
36,471,840,295 stalled-cycles-frontend:u # 124.83% frontend cycles idle
42,522,200,490 instructions:u # 1.46 insn per cycle
# 0.86 stalled cycles per insn
8,975,885,418 branches:u # 444.881 M/sec
257,211,408 branch-misses:u # 2.87% of all branches
82.651256083 seconds time elapsed
13.757756000 seconds user
7.113046000 seconds sys
And the Rust script:
261.227367 task-clock:u (msec) # 0.003 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
1,005 page-faults:u # 0.004 M/sec
262,024,893 cycles:u # 1.003 GHz
478,654,337 stalled-cycles-frontend:u # 182.68% frontend cycles idle
186,572,959 instructions:u # 0.71 insn per cycle
# 2.57 stalled cycles per insn
41,622,811 branches:u # 159.336 M/sec
2,119,788 branch-misses:u # 5.09% of all branches
83.175023848 seconds time elapsed
0.130905000 seconds user
0.149056000 seconds sys
We can see the Bash script executing 200x as many instructions, and requiring around 100x as many cycles as the Rust script.
Those aren't the only metrics we can test, however. Both
scripts are throttled by a call to sleep
with a
constant argument. This means scripts which take longer to run
will produce fewer updates per second.
Both scripts print one line per update, so I was able to measure their relative update frequency by running both scripts for the same amount of time and measuring the length of their outputs.
313 bash-output
399 rust-output
The Rust script runs at a 27% higher frequency than the bash script, furthering its efficiency lead.
On Ecosystems
Although 102 speedups are more likely to draw clicks, my most poignant takeaway from this endeavor is related to Rust's ecosystem. Many crates with underpinning C libraries offer much more thoughtful APIs, and are more accessible to casual users.
I'm repeatedly delighted by each crate's continued maintenance and development - especially pulse-binding-rust, which has added a ton of features while profoundly transforming its API for the better. This code from February:
extern "C" fn pulse_cb(_: *mut ContextInternal, info: *const ServerInfoInternal, ret: *mut c_void) {
if !info.is_null() && !ret.is_null() {
unsafe {
let name = CStr::from_ptr((*info).default_sink_name).to_owned().into_string().unwrap();
*(ret as *mut String) = name;
}
}
}
let mut pulse_sink_name: String = String::new();
.introspect().get_server_info(
pulsectx, &mut pulse_sink_name as *mut _ as *mut c_void)); (pulse_cb
is equivalent to this code from August:
.introspect().get_server_info(|info| {
pulsectx.sender.send(info.default_sink_name.clone().unwrap().into());
pulse_channel});
let pulse_sink_name = pulse_channel.receiver.recv().unwrap();
I can only hope the velocity carried by Rust's ecosystem is kept strong.