runmate automatically fuse operations and route intelligently between CPU and GPUMatlab syntax, No kernel code, no rewriting,
🌐 Website • 📖 Documentation
Status: Pre-release (v0.2)
RunMate is an initial build. The core runtime and GPU engine have already passed thousands of tests, but some plotting features are still missing or broken. Expect some rough edges. Feedback and bug reports help us decide what to fix next.
With RunMat you write your math in clean, readable MATLAB-style syntax. RunMate automatically fuses your operations into optimized kernels and runs them on the best location – CPU or GPU. On GPU, it can often match or beat hand-tuned CUDA on many dense numerical workloads
It runs on whatever GPU you have – NVIDIA, AMD, Apple Silicon, Intel – via native API (Metal/DirectX12/Vulkan). No device management. No seller lock-in. No rewriting.
Basic idea:
- MATLAB syntax, no new language
- Fast on CPU and GPUwith a runtime
- no device flag – Fusion automatically chooses CPU vs GPU based on data size and transfer cost estimation
matlab language
- Familiar
.mfiles, arrays, control flow - Many MATLAB/Octave scripts run with few or no changes
- Familiar
Fusion: Automatic CPU+GPU option
- Creates an internal graph of array ops
- fuses elementwise ops and reduction into a larger kernel
- Chooses CPU or GPU per kernel based on size and transfer cost
- Keeps array on device when fast
modern cpu runtime
- Ignition interpreter for faster startup
- Turbine JIT (cranelift) for hot paths.
- Generational GC tuned for numeric code
- Memory-safe by design (Rust)
Cross-platform GPU backend
- Uses WGPU/WebGPU
- Support Metal (MacOS), DirectX 12 (Windows), Vulkan (Linux)
- When the workload becomes too small for the GPU it falls back to the CPU.
Plotting and tooling (pre-release)
- Simple 2D line and scatter plots still work today
- Plots that use filled shapes or meshes (box plots, violin plots, surfaces, many 3D views) are not wired yet
- Has 3D plot and better camera controls on the roadmap
- There are also VS Code/Cursor extensions on the roadmap
open source
- MIT License with Attribution
- Small binary, CLI-first design
📊Performance highlights
This is where the workload is heavy Fusion chooses GPU,
Hardware: apple m2 max, MetalEach point is the mean of 3 runs.
4K Image Pipeline Full Sweep (B = Image Batch Size)
| b | runmate(ms) | pytorch(ms) | NumPy (MS) | NumPy ÷ RunMat | pytorch ÷ runmate |
|---|---|---|---|---|---|
| 4 | 217.9 | 922.9 | 548.4 | 2.52x | 4.23x |
| 8 | 270.3 | 960.1 | 989.6 | 3.66x | 3.55x |
| 16 | 317.4 | 1,040.7 | 1,859.1 | 5.86x | 3.28x |
| 32 | 520.5 | 1,178.3 | 3,698.6 | 7.11x | 2.26x |
| 64 | 893.8 | 1,379.6 | 7,434.6 | 8.32x | 1.54x |
Monte Carlo Perf Sweep (M = path)
| m | runmate(ms) | pytorch(ms) | NumPy (MS) | NumPy ÷ RunMat | pytorch ÷ runmate |
|---|---|---|---|---|---|
| 250 000 | 179.8 | 955.4 | 4,252.3 | 23.65x | 5.31x |
| 500 000 | 203.1 | 1,021.8 | 9,319.9 | 45.90x | 5.03x |
| 1 000 000 | 243.3 | 1,283.9 | 17,946.4 | 73.78x | 5.28x |
| 2 000 000 | 372.0 | 1,469.4 | 38,826.8 | 104.36x | 3.95x |
| 5 000 000 | 678.1 | 1,719.5 | 95,539.2 | 140.89x | 2.54x |
Element Wise Mathematics Complete Sweep (Marks)
| score | runmate(ms) | pytorch(ms) | NumPy (MS) | NumPy ÷ RunMat | pytorch ÷ runmate |
|---|---|---|---|---|---|
| 1 000 000 | 197.1 | 820.8 | 68.3 | 0.35x | 4.16x |
| 2 000 000 | 211.4 | 896.2 | 76.7 | 0.36x | 4.24x |
| 5 000 000 | 207.7 | 1,104.7 | 111.9 | 0.54x | 5.32x |
| 10 000 000 | 173.8 | 1,426.1 | 166.6 | 0.96x | 8.20x |
| 100 000 000 | 170.9 | 16,878.8 | 1,098.8 | 6.43x | 98.77x |
| 200 000 000 | 202.8 | 17,393.0 | 2,188.9 | 10.79x | 85.76x |
| 500 000 000 | 171.8 | 18,880.2 | 5,946.9 | 34.61x | 109.87x |
| 1 000 000 000 | 199.4 | 22,652.0 | 12,570.0 | 63.04x | 113.61x |
On small arrays, Fusion keeps working on the CPU so you still get low overhead and fast JIT.
The benchmarks are run on an Apple M2 Max with BLAS/LAPACK optimizations and GPU acceleration. See the benchmarks for reproducible test scripts, detailed results, and comparisons against NumPy, PyTorch, and Julia.
# Quick install (Linux/macOS)
curl -fsSL https://runmat.org/install.sh | sh
# Quick install (Windows PowerShell)
iwr https://runmat.org/install.ps1 | iex
# Or install from crates.io
cargo install runmat --features gui
# Or build from source
git clone https://github.com/runmat-org/runmat.git
cd runmat && cargo build --release --features guiFor BLAS/LAPACK acceleration on Linux, install the system OpenBLAS package before building:
sudo apt-get update && sudo apt-get install -y libopenblas-dev# Start the interactive REPL
runmat
# Or run an existing .m file
runmat script.m
# Or pipe a script into RunMat
echo "a = 10; b = 20; c = a + b" | runmat
# Check GPU acceleration status
runmat accel-info
# Benchmark a script
runmat benchmark script.m --iterations 5 --jit
# View system information
runmat info# Register RunMat as a Jupyter kernel
runmat --install-kernel
# Launch JupyterLab with RunMat support
jupyter lab% RunMat automatically uses GPU when beneficial
x = rand(10000, 1, 'single');
y = sin(x) .* x + 0.5; % Automatically fused and GPU-accelerated
mean(y) % Result computed on GPU% Your existing MATLAB code just works
A = [1 2 3; 4 5 6; 7 8 9];
B = A' * A;
eigenvals = eig(B);
plot(eigenvals);% RunMat automatically fuses this chain into a single GPU kernel
% No kernel code, no rewrites—just MATLAB syntax
x = rand(1024, 1, 'single');
y = sin(x) .* x + 0.5; % Fused: sin, multiply, add
m = mean(y, 'all'); % Reduction stays on GPU
fprintf('m=%.6f\n', double(m)); % Single download at sink% Simple 2D line plot (works in the pre-release)
x = linspace(0, 2*pi, 1000);
y = sin(x);
plot(x, y);
grid on;
title("Sine wave");🧱 Architecture: CPU+GPU Performance
RunMate uses a tiered CPU runtime and a fusion engine that automatically chooses CPU or GPU for each part of the math.
| Component | Objective | Technology/Notes |
|---|---|---|
| ⚙️ runmate-ignition | Baseline interpreter for quick startup | HIR → Bytecode Compiler, Stack-Based Interpreter |
| ✓ runmat-turbine | Optimizing JIT for hot code | Cranelift backend, tuned for numerical workloads |
| 🧠 runmate-gc | High-performance memory management | Generational GC with pointer compression |
| 🚀 runmate-accelerate | GPU Acceleration Subsystem | Fusion Engine + Auto-Offload Planner + wgpu backend |
| 🔥Fusion Engine | Collapses op chain, chooses CPU vs GPU | Creates op graphs, fuses ops, estimates cost, places tensors on device |
| 🎨 runmat-plot | Plotting Layer (Pre-release) | 2D line/scatter plots still work today; 3D, filled shapes and full GPU plotting are on the roadmap |
| 📸 runmate-snapshot | fast startup snapshot | Binary Blob Serialization/Restore |
| 🧰 runmat-runtime | Core Runtime + 200+ built-in functions | BLAS/LAPACK integration and other CPU/GPU-accelerated operations |
- tier cpu performance Delivers quick startup and strong single-machine performance.
- fusion engine Eliminates most of the manual device management and kernel tuning.
- gpu backend Runs on NVIDIA, AMD, Apple Silicon, and Intel via Metal/DirectX12/Vulkan, with no vendor lock-in.
🚀 GPU acceleration: fusion and auto-offload
RunMat automatically accelerates your MATLAB code on the GPU without the need for kernel code or rewriting. This system works in four stages:
RunMate creates an “acceleration graph” that captures the intent of your operation—size, operation ranges, dependencies, and constants. This graph provides an overall view of your script’s calculations.
2. Decide what should run on the GPU
The Fusion Engine detects long chains of element-wise operations and linked reductions, planning their execution as a combined GPU program. The auto-offload planner estimates break-even points and works out routes intelligently:
- fusion detection:Combines multiple operations into a single GPU dispatch
- auto-offload estimation: Considers element count, size reduction and matrix multiplication saturation
- habitat awareness: places the tensor on the device where it is worth
RunMat generates portable WGSL (WebGPU Shading Language) kernels that work on all platforms:
- Metal on macOS
- directx 12 on windows
- Vulcan on linux
The kernel is compiled once and cached for subsequent runs, eliminating recompilation overhead.
The runtime minimizes host↔device transfers by:
- Uploading tensors once and keeping them resident
- Executing fused kernels directly on GPU memory
- Collecting results only when needed (eg, for).
fprintfor display)
Example: Automatic GPU Fusion
% This code automatically fuses into a single GPU kernel
x = rand(1024, 1, 'single');
y = sin(x) .* x + 0.5; % Fused: sin, multiply, add
m = mean(y, 'all'); % Reduction stays on GPU
fprintf('m=%.6f\n', double(m)); % Single download at sinkRunMat traverses the array elementwise (sin, .*, +), fuses them into a single GPU dispatch, puts y Resident on GPU, and download only m When required for output.
For more details, see Introduction to RunMate GPUs and How RunMate Fusion Works.
🎨Modern developer experience
REPL full of intelligent features
runmat> .info
🦀 RunMat v0.1.0 - High-Performance MATLAB Runtime
⚡ JIT: Cranelift (optimization: speed)
🧠 GC: Generational (heap: 45MB, collections: 12)
🚀 GPU: wgpu provider (Metal/DX12/Vulkan)
🎨 Plotting: GPU-accelerated (wgpu)
📊 Functions loaded: 200+ builtins + 0 user-defined
runmat> .stats
Execution Statistics:
Total: 2, JIT: 0, Interpreter: 2
Average time: 0.12ms
runmat> accel-info
GPU Acceleration Provider: wgpu
Device: Apple M2 Max
Backend: Metal
Fusion pipeline cache: 45 hits, 2 missesFirst class Jupyter support
- Rich output formatting with LaTeX math rendering
- Interactive widget for parameter exploration
- Full debugging support with breakpoints
// Adding a new builtin function is trivial
#[runtime_builtin("myfunction")]
fn my_custom_function(x: f64, y: f64) -> f64 xRunMate includes a comprehensive CLI with powerful features:
# Check GPU acceleration status
runmat accel-info
# Benchmark a script
runmat benchmark my_script.m --iterations 5 --jit
# Create a snapshot for faster startup
runmat snapshot create -o stdlib.snapshot
# GC statistics and control
runmat gc stats
runmat gc major
# System information
runmat infoSee the CLI documentation for complete command reference.
RunMAT’s package enables both system programmers and MATLAB users to extend runtime. The core remains lean while the packages provide domain-specific functionality.
High-performance built-ins implemented in Rust:
#[runtime_builtin(
name = "norm2",
category = "math/linalg",
summary = "Euclidean norm of a vector.",
examples = "n = norm2([3,4]) % 5"
)]
fn norm2_builtin(a: Value) -> Result<Value, String> xBasic packages get type-safe conversion, deterministic error ID, and zero-cost documentation generation.
The MATLAB source packages compile to RunMat bytecode:
% +mypackage/norm2.m
function n = norm2(v)
n = sqrt(sum(v .^ 2));
endBoth package types appear identical to users – functions appear in namespaces, reference documentation, and tooling (help, search, document indexing).
# Declare dependencies in .runmat
[packages]
linalg-plus = { source = "registry", version = "^1.2" }
viz-tools = { source = "git", url = "https://github.com/acme/viz-tools" }
# Install packages
runmat pkg install
# Publish your package
runmat pkg publishNote: The Package Manager CLI is currently in beta. See the package manager documentation for design details.
runmate follows one Minimal core, fast runtime, open extension model Visit:
- full language support:Core implements the entire MATLAB grammar and semantics, not a subset
- comprehensive built-in: The standard library aims for full base MATLAB built-in coverage (200+ functions)
- level performance: Ignition interpreter for fast startup, Turbine JIT for hot code
- gpu-first math: Fusion Engine automatically transforms MATLAB code into faster GPU workloads
- Small, portable runtime: Single static binary, fast startup, modern CLI, Jupyter kernel support
- Toolbox as a package: signal processing, statistics, image processing and other domains live as packages
- A modern, high-performance runtime for MATLAB code
- A minimal core with a thriving package ecosystem
- GPU-accelerated by default with intelligent CPU/GPU routing
- Open source and free forever
- Re-implementation of MATLAB-in-full (toolbox packages are)
- A compatibility layer (we enforce semantics, not folklore)
- An IDE (use any editor: Cursor, VSCode, IntelliJ, etc.)
RunMate keeps the core small and uncompromisingly high quality; Everything else is a package. it enables:
- Fast iteration without destabilizing the runtime
- Domain experts shipping features without forking
- A small trusted compute base, easy auditing
- Community-driven package ecosystem
See Design philosophy for complete design rationale.
RunMat is designed for array-heavy mathematics in many domains.
Example:
|
Imaging/Geospatial 4K+ tiles, normalization, radiometric correction, QC metrics |
volume/simulation Monte Carlo Risk, Scenario Analysis, Covariance, Factor Models |
signal processing/control Filters, NLMS, large time-series jobs |
researchers and students MATLAB background, needed to run fast on laptop or cluster |
If you write math in MATLAB and reach performance walls on a CPU, RunMat is made for you.
RunMate is more than just software—it’s a movement Open, fast and accessible scientific computingWe are building the future of numerical programming, and we need your help,
🛠️ How to Contribute
🚀For Rust developers
Contribution Code → | 🔬For domain experts
Join the discussion → | 📚For everyone else
Get started → |
RunMat is licensed under MIT License with attribution requirementsThis means:
free for all – Individuals, Academics, Most Companies
forever open source – No vendor lock-in or license fees
commercial use permission – Embed your products freely
See LICENSE.md for complete terms or visit runmat.org/license for FAQs.
Dystr Inc. And built with ❤️ by the RunMat community
Star us on GitHub If RunMat is useful for you.
🚀Get started , Follow @dystr
MATLAB® The MathWorks, Inc. is a registered trademark of. RunMate The MathWorks, Inc. Not affiliated with, endorsed or sponsored by.
<a href
