Let’s talk about memory management! Following my article about 5 years of evolution in V8’s garbage collector, today I want to explain what happened to V8’s GC over the past few years.
modus operandi
I selected all commits source/stack Since my last roundup. There were 1600 of them, including Reverts and Reilands. I read all the commit logs, some changes, some linked bugs, and any design docs I could find. As far as I can tell, there have been about 4 FTEs from Google over this period, and the commitment rate has been fairly steady. There are occasional patches from Igalia, Cloudflare, Intel, and Red Hat, but it’s mostly Google.
Then, um, just from the very rigorous process of writing things down and thinking about it, I see three big stories for the V8’s GC during this time, and I’m going to give them to you along with some made-up numbers of how much effort was spent on them. First, attempts to improve memory protection via sandbox: this is about 20% of the time. Secondly, Oilpan Odyssey: maybe 40%. Third, preparation for multiple JavaScript and WebAssembly mutator threads: 20%. Then there are a number of smaller side effects: estimation wrangles (10%!!!!), and a long list of miscellaneous things. Let’s take an in-depth look at each of these in turn.
sandbox
There was a good blog post last June summarizing the sandbox effort: Basically, the goal is to prevent user-controlled writes from corrupting memory outside the JavaScript heap. We start with the assumption that the user is somehow able to obtain writable primitives anywhere, and we work to minimize the impact of such writes. The most basic approach is to reduce the range of addressable memory, specifically encoding pointers as 32-bit offsets and then ensuring that no host memory is within addressable virtual memory that an attacker can write to. The sandbox also uses some 40-bit offsets for references to larger objects with similar guarantees. (Yes, a sandbox actually reserves one terabyte of virtual memory).
But there are a lot of nuances. Access to external objects is intermediated through type-checked external pointer tables. Some objects that should never be referenced directly by user code go into a separate “trusted space”, which is outside the sandbox. Then you have read-only spaces, which are used to allocate data that can be shared between different isolates you want multiple cages, “shared” versions of other spaces for use in shared-memory multi-threading, executable code spaces with embedded object references, and so on. It has taken a lot of V8 GC developers’ time to modify, expand and maintain all these details.
I think it’s got an advantage, though, because the new development is that V8 has managed to turn on hardware memory protection for the sandbox: sandbox code is prevented by the hardware from writing memory outside the sandbox.
The “attacker can write anything in his address space” bias in the threat model has led to some strange patches. For example, sometimes code needs to check flags about the page an object is on as part of a write barrier. Therefore some GC-managed metadata needs to be sandboxed. However, the garbage collector, which is outside the sandbox, cannot be confident that the metadata is valid. In some cases we have two copies of state: in the sandbox, for use by the sandbox code, and outside, for use by the collector.
The best and most entertaining example of this phenomenon concerns integers. Google’s style guide recommends unsigned integers by default, so you have on-heap data structures. int32_t len And like this. But if an attacker overwrites a length with a negative number, some funny things can happen. The first is a signal-extended conversion size_t By run-time code, which may lead to sandbox migration. The second is mistakenly concluding that an object is small because its length is less than a threshold, which is unexpectedly negative. Good times!
oil pan
It took Odysseus 10 years to get back from Troy, which is the equivalent of conservative stack scanning to get from the oilpan to the V8. Basically, Oilpan is garbage collection for C++ as used in Blink and Chromium. Sometimes it runs when the stack is empty; Only then can it be accurate. But sometimes it runs when there may be a reference to a GC-managed object on the stack; In that case it runs conservatively.
Last time I described how V8 wants to add support for generational garbage collection in the oilpan, but for that, you’ll need a way to promote objects to the older generation that is compatible with ambiguous references seen by conservative stack scanning. I thought V8 had a chance of success with its new mark-sweep nursery, but it seems to have proved a disadvantage compared to copycat nurseries. He also tried the Sticky Mark-Bit Generational Collection, but it didn’t work. oh ok; One good thing about Google is that they are willing to try projects that have uncertain payouts, although I hope the hackers involved came through their OKR reviews with their mental health intact.
Instead, V8 added support for pinning to the scavenger copying nursery implementation. If a page has unclear edges coming in, it will be placed in a kind of quarantine zone for a period of time. I’m not sure what the difference is between a detached page, which logically belongs to the nursery, and a pinned page from mark-compact old-space; It appears they require similar treatment. In any case, it seems we have settled into a design that is mostly the same as before, but in which any page can opt out of the extraction-based archive.
What do we get from all this? Well, not only can we get the Generational Collection for the oilpan, but we can also unlock the cheaper, less bug-prone “Direct Handle” in the V8 itself.
Funnily enough, I don’t think any of these are shipping yet; Or, if so, it’s only for a small number of users in Finch testing or something. I’m curious to see posts from the upstream V8 folks; Entire doctoral theses have been written on this topic, and it would be heartening to see some real numbers.
shared-memory multi-threading
The JavaScript implementation features single-threadedness: with only one mutator, garbage collection is much easier. But this is ending. I don’t know what the state of shared-memory multi-threading is in JS, but it is rapidly advancing in WebAssembly, and Wasm uses the JS GC. Maybe I’m exaggerating the effort here – maybe it doesn’t reach 20% – but wiring it up has been a whole other thing.
I’ll just mention one patch here that I found fun. So with pointer compression, the fields of an object are mostly 32-bit words, with the exception of 64-bit doubles, so we can reduce alignment to 4 bytes on most objects. There is a bug forever open in V8 regarding alignment of double-holding objects which it mostly ignores through unaligned loads.
The thing is, if you have an object visible to multiple threads, and that object may have a 64-bit field, the field must be 64-bit aligned to prevent tearing during atomic access, which usually means the object must be 64-bit aligned. Now this is the case with WASM structures and arrays in shared space.
side quests
Well, we’ve covered the main V8 GC stories over the years. But I want to mention some fun side quests that I saw.
two-step estimation
This seems very ridiculous to me. Sad. Anyway I am happy. So any real GC has a bunch of heuristics: when to promote an object or page, when to start incremental marking, how to use background threads, when to grow the heap, how to choose whether to create a small or large collection, when to aggressively shrink memory, how much virtual address space you can reasonably reserve, what to do on hard out-of-memory situations, how to account for off-heap allocated memory, How to calculate whether concurrency marking is going to end in time or not if you need to stop… and V8 needs to do all this in all its configurations, with pointer compression off or on, on desktop, high-end Android, low-end Android, iOS where everything is weird, something called Starboard which is apparently part of Cobalt which is apparently a brand new platform that is used to show videos on YouTube set-top boxes, on machines with different memory models and with different interfaces. With the operating system on, and does it to get on and off. It appears that setting up the system involves a dose of science, a dose of poking around and trying things out, and a whole cauldron of witchcraft. There appears to be one person whose full-time job is to implement and monitor metrics on V8 memory performance and implement appropriate changes. good grief!
mutex devastation
Toon Vervest observed that V8 was displaying many more context switches than Safari on MacOS, and identified V8’s use of platform mutexes as the problem. So he rewrote them to use
os_unfair_lock
On macOS. Adaptive locking was then implemented on all platforms. Then… dusted it all off and switched to abseil.
Personally, I’m glad to see this patch series, I wouldn’t have thought there was juice to be squeezed into Locking’s use of V8. This gives me hope that I will find space to do this in one of my projects 🙂
Ta-ta, third-party stack
It used to be that MMTK was trying to get many production language virtual machines to support the abstract API so that MMTK could slot in the garbage collector implementation. Although it seems to work with OpenJDK, with V8 I think the churn rate and laser-like focus on the browser use-case makes an interoperable API abstraction vulnerable. V8 superseded it a little more than a year ago.
Wing
So what’s next? I don’t know; It’s been a long time since I’ve been to Munich to drink wine from the source. That said, shared-memory multithreading and wasy effect handler memory management will extend the hacker’s full employment act indefinitely, not to mention actually landing and shipping conservative stack scanning. There is much more to be done in non-browser V8 environments, whether in node or at the edge, but it is certainly harder to read the future than the past.
In any case, it was fun to look back and maybe in a few years I’ll get the chance to do it again. Until then, happy hacking!