Pure-Go tree-sitter runtime – no CGO, no C toolchain, WASM-ready.
go get github.com/odvcencio/gotreesitter
Implements the same parse-table format used by Tree-Sitter, so existing grammars work without recompilation. CGO performs better than binding on every workload – there are incremental edits (major operations in editors and language servers) 90 times faster Compared to the C implementation.
Every existing Go tree-sitter binding requires a CGO. That means:
- cross-compilation break (
GOOS=wasip1,GOARCH=arm64Linux, Windows without MSYS2) - CI pipelines require C toolchain in every build image
go installfails for end users withoutgcc- Race detectors, fuzzing and coverage tools work poorly across CGO limits
Gotrisitter is pure cow. go get And build – on any target, on any platform.
import (
"fmt"
"github.com/odvcencio/gotreesitter"
"github.com/odvcencio/gotreesitter/grammars"
)
func main() {
src := []byte(`package main
func main() {}
`)
lang := grammars.GoLanguage()
parser := gotreesitter.NewParser(lang)
tree := parser.Parse(src)
fmt.Println(tree.RootNode())
// After editing source, reparse incrementally:
// tree.Edit(edit)
// tree2 := parser.ParseIncremental(newSrc, tree)
}
TreeSitter’s S-Expression query language is supported, including predicate and cursor-based streaming. See Known Limitations for current warnings.
q, _ := gotreesitter.NewQuery(`(function_declaration name: (identifier) @fn)`, lang)
cursor := q.Exec(tree.RootNode(), lang, src)
for {
match, ok := cursor.NextMatch()
if !ok {
break
}
for _, cap := range match.Captures {
fmt.Println(cap.Node.Text(src))
}
}
After the initial parse, re-parse only the changed region – unchanged subtrees are automatically reused.
// Initial parse
tree := parser.Parse(src)
// User types "x" at byte offset 42
src = append(src[:42], append([]byte("x"), src[42:]...)...)
tree.Edit(gotreesitter.InputEdit{
StartByte: 42,
OldEndByte: 42,
NewEndByte: 43,
StartPoint: gotreesitter.Point{Row: 3, Column: 10},
OldEndPoint: gotreesitter.Point{Row: 3, Column: 10},
NewEndPoint: gotreesitter.Point{Row: 3, Column: 11},
})
// Incremental reparse — ~1.38 μs vs 124 μs for the CGo binding (90x faster)
tree2 := parser.ParseIncremental(src, tree)
tip: Use
grammars.DetectLanguage("main.go")Choosing correct grammar based on file name – useful for editor integration.
hl, _ := gotreesitter.NewHighlighter(lang, highlightQuery)
ranges := hl.Highlight(src)
for _, r := range ranges {
fmt.Printf("%s: %q\n", r.Capture, src[r.StartByte:r.EndByte])
}
Comment: The text predicts (
#eq?,#match?,#any-of?,#not-eq?) requiresource []byteevaluate. passingnilDisables predicate checking.
Extract definitions and references from source code:
entry := grammars.DetectLanguage("main.go")
lang := entry.Language()
tagger, _ := gotreesitter.NewTagger(lang, entry.TagsQuery)
tags := tagger.Tag(src)
for _, tag := range tags {
fmt.Printf("%s %s at %d:%d\n", tag.Kind, tag.Name,
tag.NameRange.StartPoint.Row, tag.NameRange.StartPoint.Column)
}
Everyone LangEntry exposes one Quality Fields indicating how reliable the parse output is:
| quality | Meaning |
|---|---|
full |
Token source or DFA with external scanner – full integrity |
partial |
DFA-partial – missing external scanner, tree may contain silent gaps |
none |
cannot be parsed |
entries := grammars.AllLanguages()
for _, e := range entries {
fmt.Printf("%s: %s\n", e.Name, e.Quality)
}
measured against go-tree-sitter (standard CGO bindings), parsing a Go source file with 500 function definitions.
goos: linux / goarch: amd64 / cpu: Intel(R) Core(TM) Ultra 9 285
# pure-Go parser benchmarks (root module)
go test -run '^$' -bench 'BenchmarkGoParse' -benchmem -count=3
# C baseline benchmarks (cgo_harness module)
cd cgo_harness
go test . -run '^$' -tags treesitter_c_bench -bench 'BenchmarkCTreeSitterGoParse' -benchmem -count=3
| benchmark | ns/op | b/op | allot/op |
|---|---|---|---|
BenchmarkCTreeSitterGoParseFull |
2,058,000 | 600 | 6 |
BenchmarkCTreeSitterGoParseIncrementalSingleByteEdit |
124,100 | 648 | 7 |
BenchmarkCTreeSitterGoParseIncrementalNoEdit |
121,100 | 600 | 6 |
BenchmarkGoParseFull |
1,330,000 | 10,842 | 2,495 |
BenchmarkGoParseIncrementalSingleByteEdit |
1,381 | 361 | 9 |
BenchmarkGoParseIncrementalNoEdit |
8.63 | 0 | 0 |
Summary:
| workload | gotrisitter | cgo binding | Ratio |
|---|---|---|---|
| full parse | 1,330 μs | 2,058 μs | ~1.5 times faster |
| Incremental (single-byte editing) | 1.38 μs | 124 μs | ~90 times faster |
| Incremental (no-op repair) | 8.6 ns | 121 μs | ~14,000 times faster |
The incremental hot path aggressively reuses sub-trees – a single-byte edit re-parses in microseconds while CGO binding pays the full C-runtime and call overhead. The no-edit fast path exits on a single zero-check: zero allocation, single-digit nanosecond.
205 grammars were sent to the registry. run go run ./cmd/parity_report For live per-language status.
Current Summary:
- 204 complete – Parse without errors (DFA with token source or full external scanner)
- 1 partial —
norg(Requires external scanner with 122 tokens, not implemented yet) - 0 unsupported
Backend Breakdown:
- 195 dfa – Hand-written DFA lexer, switch to external scanner where needed
- 1 dfa-partial – DFA generated without external scanner (
norg) - 9 token_source – Handwritten pure-Go lexer bridge (cert, c, go, html, java, json, lua, toml, yaml)
Go external scanner equipped with handwritten text in 111 languages zzz_scanner_attachments.go.
Full language list (205):
ada, agda, angular, apex, arduino, asm, astro, authzed, awk, bash, bass, beancount, bibtex, bicep, bitbake, blade, brightscript, c, c_sharp, caddy, cairo, capnp, chatito, circom, clojure, cmake, cobol, comment, commonlisp, cooklang, corn, cpon, cpp, crystal, css, csv, cuda, cue, cylc, d, dart, desktop, devicetree, dhall, diff, disassembly, djot, dockerfile, dot, doxygen, dtd, earthfile, ebnf, editorconfig, eds, eex, elisp, elixir, elm, elsa, embedded_template, enforce, erlang, facility, faust, fennel, fidl, firrtl, fish, foam, forth, fortran, fsharp, gdscript, git_config, git_rebase, gitattributes, gitcommit, gitignore, gleam, glsl, gn, go, godot_resource, gomod, graphql, groovy, hack, hare, haskell, haxe, hcl, heex, hlsl, html, http, hurl, hyprlang, ini, janet, java, javascript, jinja2, jq, jsdoc, json, json5, jsonnet, julia, just, kconfig, kdl, kotlin, ledger, less, linkerscript, liquid, llvm, lua, luau, make, markdown, markdown_inline, matlab, mermaid, meson, mojo, move, nginx, nickel, nim, ninja, nix, norg, nushell, objc, ocaml, odin, org, pascal, pem, perl, php, pkl, powershell, prisma, prolog, promql, properties, proto, pug, puppet, purescript, python, ql, r, racket, regex, rego, requirements, rescript, robot, ron, rst, ruby, rust, scala, scheme, scss, smithy, solidity, sparql, sql, squirrel, ssh_config, starlark, svelte, swift, tablegen, tcl, teal, templ, textproto, thrift, tlaplus, tmux, todotxt, toml, tsx, turtle, twig, typescript, typst, uxntal, v, verilog, vhdl, vimdoc, vue, wgsl, wolfram, xml, yaml, yuck, zig
| Speciality | Situation |
|---|---|
compile + execute (NewQuery, Execute, ExecuteNode) |
Supported |
cursor streaming(Exec, NextMatch, NextCapture) |
Supported |
structural quantifier (?, *, +) |
Supported |
reversion ([...]) |
Supported |
field matching (name: (identifier)) |
Supported |
#eq? / #not-eq? |
Supported |
#match? / #not-match? |
Supported |
#any-of? / #not-any-of? |
Supported |
#lua-match? |
Supported |
#has-ancestor? / #not-has-ancestor? |
Supported |
#not-has-parent? |
Supported |
#is? / #is-not? |
Supported |
#set! / #offset! instructions |
analyzed and accepted |
As of February 23, 2026, all shipped highlights and tagged questions are compiled in this repo (156/156 non empty HighlightQuery entries, 69/69 non empty TagsQuery Entries).
There are currently no known query-syntax gaps that block shipped highlight or tag queries.
1 language (norg) Requires an external scanner that has not been ported to Go. It parses using the DFA lexer alone, but tokens that require an external scanner are silently skipped. The tree structure is valid but may contain gaps. check entry.Quality to distinguish full From partial.
1. add grammar to grammars/languages.manifest.
2. Generate binding:
go run ./cmd/ts2go -manifest grammars/languages.manifest -outdir ./grammars -package grammars -compact=true
it regenerates grammars/embedded_grammars_gen.go, grammars/grammar_blobs/*.binand language register stubs.
3. add smoke samples to cmd/parity_report/main.go And grammars/parse_support_test.go.
4. Please attest it:
go run ./cmd/parity_report
go test ./grammars/...
GoTreeSitter reimplements the Tree-Sitter runtime in pure Go:
- parser — Table-driven LR(1) with GLR support for ambiguous grammars.
- incremental reuse – Cursor-based subtree reuse; Unchanged areas skip to full reanalysis
- arena allocator – Slab-based node allocation with ref counting, reducing GC pressure
- dfa lexer – generated from grammar tables
ts2goWith handwritten bridges where needed - external scanner vm – Bytecode interpreter for language-specific scanning (Python indentation, etc.)
- query engine – S-expression pattern matches with predicate evaluation and streaming cursors
- highlighter – Query-based syntax highlighting with incremental support
- tagger – Symbol definition/context extraction using tag queries
Grammar tables extracted from upstream tree-sitter parser.c by files ts2go Tools, serialized into compressed binary blobs, and lazy-loaded upon first language usage. No C code is run at parse time.
To avoid embedding blobs in the binary, build with -tags grammar_blobs_external and set GOTREESITTER_GRAMMAR_BLOB_DIR to a directory containing *.bin Grammar Drops. Uses external blob mode mmap By default on Unix (GOTREESITTER_GRAMMAR_BLOB_MMAP=false Disable).
To ship a small embedded binary with a curated language set, build with -tags grammar_set_core (The core set includes common languages such as c, go, java, javascript, python, rust, typescriptetc.).
To restrict languages registered at runtime (embedded or external), set:
GOTREESITTER_GRAMMAR_SET=go,json,python
For long-running processes, the grammar cache memory is tunable:
// Keep only the 8 most recently used decoded grammars in cache.
grammars.SetEmbeddedLanguageCacheLimit(8)
// Drop one language blob from cache (e.g. "rust.bin").
grammars.UnloadEmbeddedLanguage("rust.bin")
// Drop all decoded grammars from cache.
grammars.PurgeEmbeddedLanguageCache()
you can also set GOTREESITTER_GRAMMAR_CACHE_LIMIT Start enforcing cash caps without code changes in the process. set it 0 Only if you don’t explicitly want any retention (each grammar access will be decoded again).
Passive removal can be enabled with env vars:
GOTREESITTER_GRAMMAR_IDLE_TTL=5m
GOTREESITTER_GRAMMAR_IDLE_SWEEP=30s
Loader compaction/interning is enabled by default and tunable via:
GOTREESITTER_GRAMMAR_COMPACT=true
GOTREESITTER_GRAMMAR_STRING_INTERN_LIMIT=200000
GOTREESITTER_GRAMMAR_TRANSITION_INTERN_LIMIT=20000
The test suite includes:
- smoke test – Parse all 205 grammars without crashing any samples or generating error nodes
- purity snapshot – Golden S-expression tests catch parser and grammar regression for 20 main languages
- highlight verification – End-to-end testing that generates highlighted ranges of compiled highlight queries
- query testing – Pattern matching, predicate, cursor, field-based matching
- parser test – Incremental reparsing, error recovery, GLR ambiguity resolution
- fuzzing —
FuzzGoParseDoesNotPanicfor parser robustness
go test ./... -race -count=1
Current: v0.4.0 – 205 grammar, static parser, incremental re-parsing, query engine, highlighting, tagging.
next:
- Query engine parity hardening – field-negative semantics, metadata directive behavior, and additional edge-case parity with upstream tree-sitter query execution
- More handwritten external scanners for high-value
dfa-partialLanguages Parse() (*Tree, error)– return errors instead of silent zero trees- Automated parity testing against C tree-sitter output
- Fuzzing extension to cover more languages and query engines
MIT
<a href