On quality measurement: No controlled A/B yet on real user actions. What I measure directly is the per-rewriter signal placed (each rewriter declares an integrity score on each fire), tool-output savings on the locked 20-task benchmark (62.6% average, range 37.4% to 80.6%, reproducible via Sipcode benchmark from the repo), and per-session proxy statistics.
The 29% quality lift number is Anthropic’s published research, not mine. I’m careful not to claim that Sipcode users exclusively see 29%. The difference between “context cleared” (measurable) and “answers improved by X%” (requires controlled experiments) is real and I’d prefer to mark that rather than oversell.
On configurability, three layers:
Per-tool-call: If the cloud passes an explicit parameter (head_limit on grep, count output mode, explicit offset on read), the corresponding rewriter detects the user-supplied value and sets it aside. The cloud can apparently choose to effectively opt out of compression for a specific call. Rewriters give up rather than fight.
Selective disable via per-rewriter env var or config: Not shipped yet. Honest difference. Today a user who does over-stripping either passes an explicit parameter to that call or removes the proxy entirely via sipcode proxy-uninstall.
Per-session bypass started from inside the agent: also not shipped. Your specific scenario, where the cloud itself decides “this hook is over-strung, turn back off for now”, is actually a good design idea that I didn’t make. Per-fire integrity scores exist, so the data exists. Connecting it to agent-side self-modulation primitives is something I want to think about for v1.7.
<a href