Don’t trust large context windows

I recently watched a video that named something I was feeling. The author divides the context window of LLM into two areas. there is smart zonewhere the model is fast, and dumb areaWhere attention gets lost and the model starts forgetting what you told him five minutes ago. The cutoff sits around 100k tokens. It doesn’t matter how large the advertised reference window is.

This matters because coding agents will happily steer you straight into nerd territory. A modern agent destroys tokens rapidly. A few file reads, a long debug session, a detailed test run, and you were at 100k before lunch. Meanwhile vendors keep advertising windows of 200k, 1M, even 2M, as if those numbers represent a useful working set. they do not. Studies such as Ruler and Chroma’s report on reference rot show that effective references are a fraction of the advertised number, and as you fill the window, performance gradually degrades.

Large reference windows are mostly a marketing number. The architecture behind them works, but they paper over a problem that the underlying attention mechanism doesn’t actually solve. The number on the box gets larger with each release. The usable part does not last.

Modern agents are getting smarter about this. Tools like Cloud Code now auto-compact: when a session becomes long, the agent summarizes the history and starts fresh. this helps. But auto-compaction starts when you have already spent time in the dumb zone, and the summary itself is prepared by a model that is already bugged. It’s better than nothing, but I would like to avoid this situation altogether.

What I do is open a new session and pass it the specification that I wrote myself. This is much more of a signal handoff than any automated summary, because I have to decide what matters next. This is a breadcrumb approach applied to agents. Leave an artifact that the next session, or next person, can cleanly pick up.

You can take it further. Projects like obra/superpowers and matpocock/skills structure entire agent workflows around small, named artifacts. PRD, Plans, Skills, Sub-Agent Handoff. It’s a way of keeping the work session in the smart zone by intentionally moving every bit of information out of the session into something that the next session can read.

So I think of my reference window like a budget. I believe only the first part is really working for me, and anything I can take out of the live session and into a written artwork is one less thing to grab attention for.


continue reading



<a href

Leave a Comment