Last week I released webernetes, a partial port of Kubernetes to TypeScript
to make it possible to run clusters in the browser. I ended up generating almost
100,000 lines of code in 552 commits across 629 files. It took me 2 months.
The demo below is a webernetes cluster. It runs entirely in your browser, and
it’s genuinely doing much of the same work a real Kubernetes cluster does: pod
lifecycles, cluster DNS and networking, container garbage collection, IP
allocation, Deployment and ReplicaSet tracking, and more. The blue dots
represent pods sending requests to each other.
Interactive Webernetes demo showing HTTP requests moving between three pods from a Deployment across three Kubernetes nodes.
A cluster in your browser
Starting simulated cluster.
Wait, what am I looking at?
The question I’ve been getting most often is: “Did you compile Kubernetes to
WebAssembly?” The answer is no. A simple “hello, world!” Go program compiled to
WebAssembly is ~540KiB gzipped. That alone is already bigger than webernetes,
which is ~140KiB gzipped. Compiling all of Kubernetes to WebAssembly would no
doubt mean sending megabytes over the wire. I did try to check, but
unfortunately there are compile-time errors because Kubernetes calls system-level
APIs that aren’t available in the browser.
Instead, webernetes is:
- A partial port of Kubernetes’ “kubelet” binary, enough to run pods and probe them.
- Ports of several Kubernetes “controllers”: pod scheduler, namespace controller, kube-proxy, deployment controller, and a few more.
- A browser-based take on a container network interface (CNI), so pods can talk to each other over a simulated network.
- A browser-based container runtime, which the kubelet talks to over the container runtime interface (CRI) to run containers.
- An API for interacting with your webernetes cluster to do things like apply manifests and watch resources.
As a result of the desire to keep webernetes small, it doesn’t pull real images
from a registry like Docker Hub. Instead, it has its own browser-based registry
and you define images using a TypeScript API. Images look like this:
1import * as w8s from "@ngrok/webernetes";2 3class HelloWorld extends w8s.BaseImage {4 static readonly imageName = "hello-world";5 static readonly imageVersion = "1.0";6 7 async exec(ctx: w8s.ProcessContext, argv: string[]): Promise<number> {8 ctx.listenHttp(8080, async (_ctx, request) => {9 return {10 status: 200,11 body: "Hello, world!",12 };13 });14 return await ctx.waitUntilKilled();15 }16}
To deploy your image into a cluster, you do this:
1import * as w8s from "@ngrok/webernetes";2 3class HelloWorld extends w8s.BaseImage {4 // as before ...5}6 7const cluster = new w8s.Cluster();8await cluster.registerImage(HelloWorld);9 10const [pod] = await cluster.apply([11 {12 apiVersion: "apps/v1",13 kind: "Deployment",14 metadata: { name: "hello-world-deployment" },15 spec: {16 replicas: 1,17 selector: {18 matchLabels: { app: "hello-world-pod" },19 },20 template: {21 metadata: {22 labels: { app: "hello-world-pod" },23 },24 spec: {25 containers: [26 {27 name: "hello-world-container",28 image: "hello-world:1.0",29 },30 ],31 },32 },33 },34 },35]);
And then you can use the webernetes API to interact with the cluster, like this:
1// List all pods in the default namespace2const pods = await cluster.api.corev1.listNamespacedPod({3 namespace: "default",4});5 6// Watch for changes to pods in all namespaces7const informer = cluster.informer("pods", (type, pod) => {8 console.log(`pod ${type}: ${pod.metadata?.name}`);9});10 11// Stop the informer when you're done12await informer.stop();13 14// Listen to pods sending requests and responses to each other.15// This is how I visualise the moving dots above.16cluster.on("request", (event) => {17 console.log(`request: ${event.request.method} ${event.request.url}`);18});19cluster.on("response", (event) => {20 console.log(`response: ${event.response?.status}`);21});22 23// Use the cluster network to send a request to a pod. This will also trigger24// the request/response event handlers above.25const pod = pods.items[0];26const resp = await cluster.fetch(`http://${pod.status?.podIP}:8080/`);27console.log(resp.body); // "Hello, world!"
There are plenty more examples in the webernetes repository.
Webernetes is intended to be used to make interactive Kubernetes content; it’s
not a production-ready Kubernetes distribution. It doesn’t need to run real
images. It just needs a way for creators to set up specific workloads to
illustrate the thing they’re trying to teach.
Over time, it is my intention to expand webernetes to support more Kubernetes
features. Right now, it doesn’t support ConfigMaps, Secrets, pod resources,
persistent volumes, and a whole host of other things I haven’t needed yet. As I
make more content with this library, I’ll implement more of what I need.
If you’re looking to build on webernetes and it doesn’t support something you
need, please reach out! I’m s.rose@ngrok.com and I’d
be happy to help you become a contributor.
Is this just slop?
Almost all of the webernetes code was authored by LLMs. I expect people to be
dubious of the project as a result. I expect to be accused of slop-porting
Kubernetes for views, but I’m going to try to show you that’s not what I’ve
done.
I did two things that I think make this a slop-free project:
- I reviewed every line of code.
- I created hundreds of tests that assert webernetes behaves the same as a real cluster.
The first point, by far the most time-consuming, is how I gained the confidence
that the vast majority of the code is line-for-line identical to the Kubernetes
Go codebase. The second point is how I made sure the lexical similarity
translates to identical behaviour in practice.
Any mistakes that remain in the codebase after my review are on me, and I’ve no
doubt some exist. If you find any, please let me know by opening an issue.
Why review the code?
The stories I’ve read about LLMs being used to write a C compiler or port
Bun from Zig to Rust were made possible by having an automated way to
assert correctness. Anthropic had plenty of existing C compilers to compare
against, and Bun had a large existing test suite that its maintainers trusted
enough to merge over 1 million lines of new Rust code without manual review.
I didn’t have those things. If I wanted a test suite, I’d need to write it
myself. If I wanted to compare against real Kubernetes, I’d need to figure out a
way to do it.
Most of the code in webernetes is ported from the Kubernetes Go codebase. I
ported it with LLMs because I was confident that would be faster than typing it
by hand, but the problem I quickly encountered was that LLMs suck at porting
code. No matter how hard I tried, they kept making mistakes. The mistakes
came in a few flavours:
- Shortcuts. This seemed to happen more when porting larger files, and it made me wonder if I was battling with post-training choices designed to make LLMs less verbose. An example of a shortcut I saw a lot: Kubernetes contains a lot of different types of cache. There’s an LRU cache, an expiring cache, a FIFO cache, a transforming cache, and more. Instead of implementing these, I’d catch the LLM using a
Mapinstead, leading to incorrect behaviour. - Trying to be too helpful. The LLM would try to tidy up code by inventing helper functions that didn’t exist in the original Go code. Often, these helpers were harmless, but sometimes they’d have subtle differences. Either way, they made the code harder for me to review side-by-side with the original, so I asked the LLM to remove them.
- Stuff just… missing. This happened most often when porting over table tests from Go. If you’re unfamiliar, table tests look like this. They are arrays of test cases with some test code underneath. The LLM would arbitrarily omit tests and I’d have to ask it why. Sometimes it would talk about the test not being applicable. Occasionally that was true, but usually the LLM would own up to omitting the test by mistake.
I know at least a few of you are screaming “SKILL ISSUE” and are ready to
comment saying I need to get better at prompting. That could be true! I would
love to see an example prompt that perfectly one-shots porting this table
test from Go to TypeScript. You stand to save me an enormous amount of time
in future.
Until then, for me to have confidence in an LLM porting something, I need to
review the output. I’m not aware of any shortcuts.
What the tests look like
It’s all well and good to know that the code is side-by-side identical, but does
it actually work? Go and JavaScript have different runtime environments, so it
was always possible that the same code would behave differently in each. I also
ended up having to create JavaScript versions of channels, mutexes,
Go’s select statement, and other Go-isms. I needed to know they worked in
non-trivial scenarios.
To feel good about this, I wrote tests where the exact same code is run against
both webernetes and a k3s cluster. To do this, I needed to have an API for
webernetes that matched an existing Kubernetes API. I picked
kubernetes-client/javascript because it’s the official client library for
Kubernetes in JavaScript, and it has TypeScript types.
Here’s an example test:
1import { expect, it } from "vitest";2import { kubernetes } from "../../test/harnesses/kubernetes";3 4// `kubernetes.describe` does some magic behind the scenes to set up either a5// k3s (https://k3s.io/) cluster or a webernetes cluster and pass that in via6// the `context` argument.7//8// Then I can run either `pnpm test:node` or `pnpm test:browser` to run tests9// against k3s using a Node environment, or webernetes using a headless browser.10kubernetes.describe("Pods", (context) => {11 const { core } = context;12 const { getTestNamespace, waitFor } = context.helpers;13 14 it("should be able to delete a pod", async () => {15 // Tests get their own unique namespace for isolation from each other16 const namespace = await getTestNamespace();17 18 await core.createNamespacedPod({19 namespace,20 body: {21 metadata: { name: "delete-test" },22 spec: {23 containers: [24 {25 name: "pause",26 // Webernetes has an implementation of this image built-in27 image: "registry.k8s.io/pause:3.10",28 },29 ],30 },31 },32 });33 34 // Make sure the pod definitely exists before moving on.35 await waitFor(async () => {36 const pods = await core.listNamespacedPod({ namespace });37 const found = pods.items.find((pod) => pod.metadata?.name === "delete-test");38 expect(found).toBeDefined();39 });40 41 await core.deleteNamespacedPod({42 name: "delete-test",43 namespace,44 });45 46 // Wait until the pod is definitely gone before declaring success.47 await waitFor(async () => {48 const pods = await core.listNamespacedPod({ namespace });49 const found = pods.items.find((pod) => pod.metadata?.name === "delete-test");50 expect(found).toBeUndefined();51 });52 });53});
The core object, with its createNamespacedPod and deleteNamespacedPod
methods, is an example of the kubernetes-client/javascript API. The
kubernetes.describe(..) helper I created to run these tests injects a core
object that points at k3s when I run pnpm test:node, and at webernetes when I
run pnpm test:browser.
These are the integration tests for the project. They make sure that my porting
work is correct and my custom browser-based container runtime and cluster
network are working in a way that matches a real cluster.
Whenever I spot a bug while working with the library, the first thing I do is
create a test that passes against k3s and fails against webernetes. Then I
use that feedback loop to get an LLM to help me understand and fix the problem.
At time of writing, webernetes has 204 integration tests. They sit alongside
1,855 unit tests, most of which are direct ports from the Kubernetes Go codebase.
Is review and testing really enough to make it not-slop?
I think so, yes.
When I got PRs from human beings, back in the good old days, what I expected to
find were good tests and good code. I have the same expectation of LLMs. The
difference in 2026 is that, while I generally trusted my human colleagues to do
good work, I feel quite safe assuming that an LLM won’t. You need to review its
output, and you need to insist on tests.
It’s not enough to do one or the other, either. Without reviewing at least the
test code, how do you know what success criteria the LLM is working to? And if
you review all of the code but have no tests, do you really trust your squishy
human brain to reason through every possibility? I don’t. I don’t even trust
myself to do this with my own hand-written code.
Because they don’t get tired and they type really fast, I think LLMs complement
our human weaknesses really well. It’s fun to ask the LLM to come up with edge
cases you haven’t thought of, then write tests for them if they make sense. You can
do this dozens of times if you want, until the suggestions are nonsense. The LLM
won’t mind!
Combining my unique strengths of taste and understanding with the LLM’s ability
to write fast, without fatigue or wrist pain, has been the biggest step change
in what feels possible since I started my career in 2012.
Number crunching
Given the retrospective nature of this post, I thought it’d be fun to make some
graphs showing how the project evolved. The first one shows lines of code over
time.
webernetes lines of code by week
Chart showing weekly Git additions, deletions, and cumulative net lines for Webernetes from April 20 through June 15, 2026.
Use left and right arrow keys while focused on the chart to review each week’s values.
| Week | Added lines | Deleted lines | Net lines | Total lines |
|---|---|---|---|---|
| Week of Apr 20 | 17,074 | 5,434 | 11,640 | 11,640 |
| Week of Apr 27 | 14,759 | 5,739 | 9,020 | 20,660 |
| Week of May 4 | 5,732 | 1,344 | 4,388 | 25,048 |
| Week of May 11 | 9,520 | 4,151 | 5,369 | 30,417 |
| Week of May 18 | 14,675 | 2,791 | 11,884 | 42,301 |
| Week of May 25 | 12,927 | 1,073 | 11,854 | 54,155 |
| Week of Jun 1 | 31,372 | 5,823 | 25,549 | 79,704 |
| Week of Jun 8 | 25,165 | 6,337 | 18,828 | 98,532 |
| Week of Jun 15 | 29,967 | 1,857 | 28,110 | 126,642 |
This graph doesn’t quite capture the full reality. Early work was done in a
branch of the repo behind this blog site, because it wasn’t obvious to me at the
time that it would become its own project. The first commit to what would
become the https://github.com/ngrok/webernetes repo was on April 21st.
The graph also says ~126k lines, not the ~100k that I claimed at the
start of the post. This is because the 100k number excludes non-TypeScript,
comments, and the demo app.
LLM token consumption over time
Chart showing weekly combined uncached input, cached input, and output token usage across Codex and Claude sessions for Webernetes from April 20 through June 15, 2026.
Use left and right arrow keys while focused on the chart to review each week’s values.
| Week | Uncached input tokens | Cached input tokens | Output tokens |
|---|---|---|---|
| Week of Apr 20 | 3,874,487 | 78,082,526 | 606,678 |
| Week of Apr 27 | 7,645,946 | 254,519,999 | 726,881 |
| Week of May 4 | 2,885,324 | 121,050,752 | 282,128 |
| Week of May 11 | 10,309,560 | 344,972,032 | 827,846 |
| Week of May 18 | 15,866,022 | 637,270,656 | 1,288,182 |
| Week of May 25 | 11,077,746 | 318,710,272 | 892,938 |
| Week of Jun 1 | 33,380,099 | 837,972,608 | 2,834,083 |
| Week of Jun 8 | 28,875,156 | 794,703,104 | 2,407,530 |
| Week of Jun 15 | 104,155,857 | 2,196,467,968 | 6,420,826 |
- Uncached input
- Output
- Cached input (right axis)
Make sure you really take in the magnitude of the two Y axes. Coding agents
consume far more cached input tokens than any other type of token, especially
if you’re often filling up long context windows.
Sam, what on earth happened in the last week?
Yeah… about that.
I was working on the demo app and thought it would be cool if it had
support for Deployments. I didn’t think it would take long. I was wrong. In my
panic, I threw lots of tokens at the problem.
The LLM’s first attempt at porting the required components missed a huge amount
of functionality, so I kicked off a team of agents to identify the chain of
dependencies and port each component over with even more sub-agents. I then
used yet another set of sub-agents to review everything.
I’m not sure how I feel about this style of working with LLMs, but it undeniably
got the job done more quickly than I would have. I still did my manual review at
the end, but the token efficiency feels extremely poor.
API-equivalent LLM token cost by week
Chart showing weekly API-equivalent token costs across Codex and Claude sessions for Webernetes from April 20 through June 15, 2026.
Use left and right arrow keys while focused on the chart to review each week’s values.
| Week | API-equivalent cost |
|---|---|
| Week of Apr 20 | $40.52 |
| Week of Apr 27 | $187.26 |
| Week of May 4 | $83.42 |
| Week of May 11 | $248.87 |
| Week of May 18 | $436.61 |
| Week of May 25 | $241.53 |
| Week of Jun 1 | $670.91 |
| Week of Jun 8 | $613.95 |
| Week of Jun 15 | $1,811.64 |
My time was still the most expensive line item in the project, even at the end.
Conclusion
If you got this far in the post, you may also enjoy watching the series I
recorded with my colleague Ryan Blunden chronicling the making of webernetes as
it happened. You’ll get to see all of my early misplaced optimism, as well as
some insight into how I work mostly hands-free with voice control and eye
tracking.
Part 1
Part 2
Part 3 is coming soon!
Please take webernetes for a spin! File issues! Email me at
s.rose@ngrok.com when you build something cool or
get stuck! I want this project to thrive and make a difference, and I can’t do
that without your help.
<a href