Reasoning About Reference Cycles

Rust’s lifetime types are challenging, but they bring a capability I miss all the time in other languages.

Assumed audience: Software developers who want (or at least are willing!) to understand a bit more about how Rust’s making ownership explicit can genuinely change what is possible to understand about a piece of software.

I’ve spent a large chunk of the last year thinking about memory, and in particular memory leaks, in JavaScript applications. In talking with a friend just now about a project he’s working on, I was asking whether and how he’d managed to avoid reference cycles in a particular chunk of the code. It turns out: there absolutely are cycles in that code, just as there were in the memory leak I spent 6 months slowly chasing down a year ago. Whether the cycles in my friend’s code will result in leaks is an open question, and he’ll need to do some pretty careful tuning and checking to avoid those leaks.

I noted to him — 

I assume you’re thinking about that, but it’s annoyingly hard to reason about in almost any language that isn’t Rust.

Because: Rust makes it possible to reason about this in a way that no other language I’m familiar with does. The famously difficult learning curve associated with lifetimes and ownership, enforced by the borrow checker — 

The famously difficult learning curve associated with lifetimes and ownership, enforced by the borrow checker, is a matter of making something explicit which is usually left implicit in mainstream programming languages. When writing a program in JavaScript or Java, most of the time memory cycles don’t come up. When they do, you just” figure out how to introduce a weak reference — literally a WeakReference in Java or WeakRef in JavaScript, or the higher-level weak primitives built on them. The same goes for Swift, C, etc.

In Rust, lifetimes and ownership are front and center. You cannot avoid thinking about them for very long. JavaScript (or TypeScript) will happily let you write a mutually-recursive data structure like a doubly-linked list:

class Node {
  prev?: Node;
  next?: Node;

  constructor({ prev, next } = {}) {
    this.prev = prev;
    this.next = next;
  }
}

// We have a cycle!
let a = new Node();
let b = new Node({ prev: a });
a.next = b;

Here’s a relatively naïve direct translation of that code into Rust:

struct Node< {
    prev: Option<Box<Node>>,
    next: Option<Box<Node>>,
}

impl Node {
    fn new(prev: Option<Box<Node>>, next: Option<Box<Node>>) -> Node {
        Node {
            prev,
            next
        }
    }
}

fn main() {
    let mut a = Node::new(None, None);
    let mut b = Node::new(Some(Box::new(a)), None);
    a.next = Some(Box::new(b));
}

Rust… will simply reject this outright.1 The cycles that could (and all too often do) result in memory leaks in the JavaScript code are not even possible in Rust.

On the one hand, that might seem annoying. This is, for good reason, the canonical example of a thing that is impossible to write in safe Rust without some tricky bits of indirection. (For example, and of course the canonical writeup.) Unlike in the JavaScript example — or a Java or C++ example you could write just as easily — where the ownership and lifetimes of these items are totally implicit, Rust tracks them with as types, checked by the compiler, and the compiler rejects this cycle where two items own each other.

The cycle is fine right up until you want to mutate one of them. Shared mutable state == bugs. Sometimes just” logical bugs; but sometimes serious memory safety bugs, especially data races. Thus, on the other hand, this annoyance” about Rust is actually preventing really, really serious issues. Issues of the sort that cost me months hunting them down — in far gnarlier JavaScript code in late 2021 – early 2022.

Again, lifetimes and ownership are present in basically every mainstream programming language, but because they’re implicit, it’s hard to learn to reason about them. Rust makes you reason about them, the same way jQuery made you learn higher-order functions.

But, to return to the start of the post, it also gives you an incredibly useful capability. All your opportunities for memory cycles are: (1) explicit in the code and/or (2) checked by the compiler. In JavaScript or Java or any other mainstream garbage-collected language, if you want to reason about the memory semantics of a given piece of code, the answer is roughly: Good luck, have fun manually tracing through this code and looking at heap dumps and memory profiles.” The same goes for C or C++ (yes, even modern C++ with smart pointers) or even Zig or Odin.

On balance, I think it makes a lot of sense to just use garbage collection for many, perhaps even a majority, of use cases in programming. Particularly for application” code. But when you really need to care about things like reference cycles and the possibility of memory leaks, Rust’s explicitness is not just a nice-to-have: it is literally the only mainstream language which makes it possible to reason about them explicitly in the code.


Notes

  1. It points out that you’re trying to mutate a.next after handing a to b, violating the single owner principle at the root of Rust’s memory safety guarantees. ↩︎