macOS agent that turns prompts into automated clicks – Openclick

@whateverneveranywhere thank you!

Yes, that staleness gap is exactly where we burn out quickly. The planner chooses something like “Click on element 47” and by the time it runs, the page has re-rendered and 47 is now a completely different button.

What we do in OpenClick is basically two layers.

Within a batch: Each AX action (click, type, etc.) re-solves the target just before execution using a fresh AX snapshot, not the one the planner saw. We never rely on element indices. Everything is matched via __ax_id, title, or more static signals like role + frame.

If the state of an action is likely to change, we force an AX refresh before the next step, as this is where things usually go astray. Pixel renders are just a fallback for things like Canvas or WebGL where AX is useless.

Between batches: We take a fresh screenshot and AX snapshot, then run a validator model that checks whether what we expected actually happened. If not, or only partially, we replan with the new state and do a brief review of what happened.

So we don’t really trust plans for more than one batch, and for this reason we keep batches small (usually 3-5 actions).

Honestly, the toughest cases now are not AX drift, but apps that expose AX inconsistently or lazily. Gmail is a classic. Message lines can be weird, so we sometimes force AX to refresh right before clicking on them. Otherwise you’ll get cases where a code click “works” but the row is never actually activated.

Curious to know what approach you ended up taking here.



<a href

Leave a Comment