Devlog: Garbage Collection Is Useful

14 November 2025

A long time ago, I spent a few years working on garbage collection in the J9 Java VM.^{And even though I’ve done mostly high-level stuff since then, deep knowledge of GC remains useful.}

Yesterday, it was an insight from one of my favorite GC papers, A Unified Theory of Garbage Collection, that helped me solve a tricky problem.

OM’s incremental parsing

I’m working with a team that is using OM to parse text documents and render a rich text version in ProSeMirror. The goal is bidirectional updating: changes to the ProMirror should propagate to the text version, and vice versa.

OM supports incremental parsing, which means if you parse some text and then make a small edit, it can immediately re-parse by reusing parts of the previous result.

It also supports limited forms of incremental changes. you can define a PropertyWhich is like a memoized visitor, and the attribute value for a given node will only need to be recalculated if the edit affected one of its subtrees. So you can easily implement a form of persistent data structure, where each new value (for example, an AST) shares a set of structure with the previous one.

Problem

Using this machinery, I attempted to make a pmNodes The attribute that generated the ProMirror document for a given input. When the text document is edited, it creates a new tree that shares a set of nodes with the previous one.

prosemirror gc — `pmNodes` tree before and after editing.

Then, my plan was to create a ProMirror transaction that would convert the old tree to the new one. To do this, it is useful to know which nodes appeared in the old document, but not in the new one.

My first implementation of this was the equivalent of detecting garbage collection – after each edit, I went through the entire document, and recorded all the nodes in a set. The difference between the sets told me which nodes were dead.

But this kind of defeats the purpose of incrementality – if you have a long document and you make a small edit, we need to be able to process Without Visiting each node in the document.

Solution

Then I remembered the OOPSLA paper, A Unified Theory of Garbage Collection, written by some of my former colleagues in 2004:

Tracing and reference counting are similarly viewed as fundamentally different approaches to garbage collection that have very different performance properties. We’ve implemented both types of high-performance collectors, and in the process noticed that the more we optimized them, the more they behave similarly – that they seem to share some deep structure.

We present a formulation of the two algorithms that shows that they are actually duals of each other. Intuitively, the difference is that tracing operates on living objects, or “matter”, while reference counting operates on dead objects, or “anti-matter”. For each operation performed by the tracing collector, an exact corresponding anti-operation is performed by the reference counting collector.

This was the answer I needed! Instead of visiting all living objects, I wanted to visit only dead objects, and reference counting would let me do that.

So I added a way to maintain the reference count for all nodes in the document. When we create a new document, we decrease the reference count of the old root node (later it will always be 0). So we recursively decrement the ref count of its children, and so on. This gives me exactly what I wanted – a way to find all the nodes. No Most nodes in the document are reused without visiting them.

devlog: garbage collection is useful

OM’s incremental parsing

Problem

Solution

Like this:

Related

Leave a Comment Cancel reply

OM’s incremental parsing

Problem

Solution

Share this:

Like this:

Related

Leave a Comment Cancel reply