CPNs, LLMs, and Distributed Applications




CPN, LLM, and distributed applications

A big theme in LLM-enabled software development is that verifiable correctness makes it much easier to make big leaps forward with LLM. For example tests, compilers, state machines etc. While doing research for a databuild, I recently came across colored petri nets, and immediately saw the opportunity.

Colored Petri Nets (CPNs) are an extension of Petri Nets. Petri nets are essentially directed bipartite graphs where locations can contain tokens, and locations are connected by transitions (where side effects occur). In a Petri net, a single token contains no data, and represents an identity-less token space in the net. Importantly, transition Petri nets that globally have only one input and output and one token are equivalent to finite state machines. Colored Petri nets extend this model, allowing data to be associated with individual tokens. This allows CPNs to closely match the Rust typestate pattern, and suggests that Rust may be able to more easily implement CPN semantics.

CPNs are of particular interest because a) concurrent applications are still difficult to write and b) they provide the ability to formally verify concurrent programs at build time. Furthermore, if we can implement a high-performance datastore underneath, a CPN framework may be able to handle the difficult parts of concurrent applications: state synchronization, conflict detection, deadlock avoidance, and coordinating access to shared resources. These are enabled through two other features of CPN: guards and multi-token consumption/production.

Guards are a list of Boolean conditions that can apply to a transition: things that must be true for a token to take that transition. For example, there must be more than 0 connections available in the connection pool to claim one. Multi-token consumption/production is exactly what it sounds like: a fork in the network via a transition, where P1 -> T1 -> (P2, P3), so T1 consumes a token from P1, generating a token from each of P2 and P3 simultaneously. In contrast, joining the network via a transition, where (P1, P2) -> T1 -> P3, would require that both P1 and P2 have tokens capable of transitioning to P3 via T1, which would also happen simultaneously.

One application that involves mild concurrency is web scraping with a leased proxy and scrape target. You have a limited number of proxies to proxy your requests to, and need to rate the usage of all of them to ensure they don’t exceed a given target. Additionally, you both want to avoid making multiple requests to the same target, and also want to avoid making repeated requests to the domain in order to be a responsible user. Traditionally, this is solved by leasing resources through a central database, which is implemented select for update Style semantics in databases. We can imagine this implemented with CPN semantics scrape_target infection is involved available_proxies And prioritized_targets At locations, the scrape begins only when the proxy is available and the priority target is available. You can also imagine implementing other complex distributed scraper features with CPN semantics:

  • scrap target cooldown: After being scraped, a target token is converted into a cool_down States, where using timed Petri nets allows us to delay the transition back prioritized_targets After the delay period.
  • Domain-level rate limiting:add a different domains token, so that scrape_target Transition is a 3-way association domains x available_proxies x prioritized_targets.
  • try again with backoff:fail scrapes fork: one token for fail_log, one for prioritized_targets, with an increased retry count and a longer cooldown. The guard prevents retries beyond the maximum number of attempts.
  • results pipeline – After scraping, tokens flow through raw_html → parsed → validated → stored, each with its own concurrency limits, which naturally applies back pressure.

Another application is databuild, where the network is composed of partitions (which are effectively leased to job runs), wants and job runs, and we would benefit from some self-organized dynamic process to propagate data dips and resolve user-indicated partitions in a safe, efficient and fast way. But more about that man later.

I’m still figuring this part out, but it seems there are some sensible or interesting strategies for implementing CPN:

  1. As a traditional app + Postgres combo, transactions simultaneously implement token movement and state storage, and select for update Enabling claiming tokens for transitions to prevent double moves.
  2. As a single-process Rust binary, with all CPN state stored in memory, enabling move semantics and typestate patterns to implement CPN semantics (this could be very fast, but requires persistence detection – perhaps snapshot event log?)

One thing I’m trying to figure out is how to solve the partitioning problem: If the app state becomes too large for 1 server’s memory, is there a way to automatically partition the network/token to enable fearless concurrency and horizontal scalability? So far the answer seems to be to a) solve the storage location/transition in the application by making it part of the network or b) solve it at the database level. Or perhaps something even more strange, where the entire network is composed of multiple CPN apps, exposing their own query/consumption interfaces?

Tying it all back in, if we can figure out a CPN app framework with sensible persistence, we can put more compile-time constraints on agent-written code (good for develop velocity), as well as provide out-of-the-box correctness guarantees and simulation capabilities that independent computation apps don’t provide. exciting!


random thoughts

  • !! Are there any existing open source services that I can use a test suite or benchmark to verify? Especially if it means just setting the cloud code against it?
  • Timed Petri Nets – Can we emit metrics by default and simulate the net effect of changes across the system?
  • Join transitions as literal joins to the database, indexes into transition states, and join token transitions as inserts and deletes inside transactions.
  • To what extent are we talking about building a datastore or CPN database?
  • Distributed network of CPN apps? Or an app as a distributed network of CPN services? Is it all about division?
  • CPN does not require explicit re-entry? Is each “move” a movement of tokens, which is entirely possible given the positions of all the tokens involved?

Probable Bets (LLM Written)

To verify this, we need a real hill to climb: reimplement something with CPN semantics and compare. The main question is not “can CPNs be fast/accurate” – of course they can. Its: What paradigm makes it easy to write simple, correct, and fast concurrent programs? Does formalization pay off in the form of fewer bugs and less specific coordination code?

Prerequisite: Scraper Scheduler (Spider-RS)

Implement the decision-making core of the scraper – URL prioritization, domain rate limiting, proxy assignment, retry scheduling – using CPN. Mock the HTTP layer; Benchmark decisions/second and coordinate code complexity compared to the original.

Why scrapers? Traditional implementations are full of ad-hoc coordination: rate limit maps, cooldown trackers, retry counters, domain queues. This is exactly what the CPN formalizes. Correctness matters (violations of politeness get you banned), and timely changes are central, not incidental. The Spyder-RS is already rusty, so we can do a direct comparison.

Why not others first?

Option Why not
work queue (factory) Coordination is contingent; Well understood patterns where CPN overhead may not be paid
Connection Pooler (PGCAT) Too small – can’t utilize enough CPN capabilities to be compelling
CI Runner (Buildkite) Too large-accidental complexity (artifacts, logs, shell executions) obscures the coordination story

implementation approach

In-memory Rust with SQLite snapshots: single-threaded, move semantics implemented, extremely fast. Partitions = separate binaries with disjoint token ownership (enforced at net-definition time via partition key requirements). Simulation = enables identical CPN execution, correctness testing, failure injection and time forwarding with simulated effects.



<a href

Leave a Comment