runmat-org/runmat: RunMat is a fast runtime designed to run math workloads. Run MATLAB/Octave code blazing fast with cross-platform GPU support across Mac, Windows, Linux, NVIDIA / CUDA, ARM and more

runmate automatically fuse operations and route intelligently between CPU and GPUMatlab syntax, No kernel code, no rewriting,

create situation
license
crates.io
download

🌐 Website • 📖 Documentation

Status: Pre-release (v0.2)

RunMate is an initial build. The core runtime and GPU engine have already passed thousands of tests, but some plotting features are still missing or broken. Expect some rough edges. Feedback and bug reports help us decide what to fix next.

With RunMat you write your math in clean, readable MATLAB-style syntax. RunMate automatically fuses your operations into optimized kernels and runs them on the best location – CPU or GPU. On GPU, it can often match or beat hand-tuned CUDA on many dense numerical workloads

It runs on whatever GPU you have – NVIDIA, AMD, Apple Silicon, Intel – via native API (Metal/DirectX12/Vulkan). No device management. No seller lock-in. No rewriting.

Basic idea:

MATLAB syntax, no new language
Fast on CPU and GPUwith a runtime
no device flag – Fusion automatically chooses CPU vs GPU based on data size and transfer cost estimation

matlab language
- Familiar .m files, arrays, control flow
- Many MATLAB/Octave scripts run with few or no changes
Fusion: Automatic CPU+GPU option
- Creates an internal graph of array ops
- fuses elementwise ops and reduction into a larger kernel
- Chooses CPU or GPU per kernel based on size and transfer cost
- Keeps array on device when fast
modern cpu runtime
- Ignition interpreter for faster startup
- Turbine JIT (cranelift) for hot paths.
- Generational GC tuned for numeric code
- Memory-safe by design (Rust)
Cross-platform GPU backend
- Uses WGPU/WebGPU
- Support Metal (MacOS), DirectX 12 (Windows), Vulkan (Linux)
- When the workload becomes too small for the GPU it falls back to the CPU.
Plotting and tooling (pre-release)
- Simple 2D line and scatter plots still work today
- Plots that use filled shapes or meshes (box plots, violin plots, surfaces, many 3D views) are not wired yet
- Has 3D plot and better camera controls on the roadmap
- There are also VS Code/Cursor extensions on the roadmap
open source
- MIT License with Attribution
- Small binary, CLI-first design

📊Performance highlights

This is where the workload is heavy Fusion chooses GPU,
Hardware: apple m2 max, MetalEach point is the mean of 3 runs.

4K Image Pipeline Full Sweep (B = Image Batch Size)

b	runmate(ms)	pytorch(ms)	NumPy (MS)	NumPy ÷ RunMat	pytorch ÷ runmate
4	217.9	922.9	548.4	2.52x	4.23x
8	270.3	960.1	989.6	3.66x	3.55x
16	317.4	1,040.7	1,859.1	5.86x	3.28x
32	520.5	1,178.3	3,698.6	7.11x	2.26x
64	893.8	1,379.6	7,434.6	8.32x	1.54x

Monte Carlo Perf Sweep (M = path)

m	runmate(ms)	pytorch(ms)	NumPy (MS)	NumPy ÷ RunMat	pytorch ÷ runmate
250 000	179.8	955.4	4,252.3	23.65x	5.31x
500 000	203.1	1,021.8	9,319.9	45.90x	5.03x
1 000 000	243.3	1,283.9	17,946.4	73.78x	5.28x
2 000 000	372.0	1,469.4	38,826.8	104.36x	3.95x
5 000 000	678.1	1,719.5	95,539.2	140.89x	2.54x

Element Wise Mathematics Complete Sweep (Marks)

score	runmate(ms)	pytorch(ms)	NumPy (MS)	NumPy ÷ RunMat	pytorch ÷ runmate
1 000 000	197.1	820.8	68.3	0.35x	4.16x
2 000 000	211.4	896.2	76.7	0.36x	4.24x
5 000 000	207.7	1,104.7	111.9	0.54x	5.32x
10 000 000	173.8	1,426.1	166.6	0.96x	8.20x
100 000 000	170.9	16,878.8	1,098.8	6.43x	98.77x
200 000 000	202.8	17,393.0	2,188.9	10.79x	85.76x
500 000 000	171.8	18,880.2	5,946.9	34.61x	109.87x
1 000 000 000	199.4	22,652.0	12,570.0	63.04x	113.61x

On small arrays, Fusion keeps working on the CPU so you still get low overhead and fast JIT.

The benchmarks are run on an Apple M2 Max with BLAS/LAPACK optimizations and GPU acceleration. See the benchmarks for reproducible test scripts, detailed results, and comparisons against NumPy, PyTorch, and Julia.

# Quick install (Linux/macOS)
curl -fsSL https://runmat.org/install.sh | sh

# Quick install (Windows PowerShell)
iwr https://runmat.org/install.ps1 | iex

# Or install from crates.io
cargo install runmat --features gui

# Or build from source
git clone https://github.com/runmat-org/runmat.git
cd runmat && cargo build --release --features gui

For BLAS/LAPACK acceleration on Linux, install the system OpenBLAS package before building:

sudo apt-get update && sudo apt-get install -y libopenblas-dev

# Start the interactive REPL
runmat

# Or run an existing .m file
runmat script.m

# Or pipe a script into RunMat
echo "a = 10; b = 20; c = a + b" | runmat

# Check GPU acceleration status
runmat accel-info

# Benchmark a script
runmat benchmark script.m --iterations 5 --jit

# View system information
runmat info

# Register RunMat as a Jupyter kernel
runmat --install-kernel

# Launch JupyterLab with RunMat support
jupyter lab

% RunMat automatically uses GPU when beneficial
x = rand(10000, 1, 'single');
y = sin(x) .* x + 0.5;  % Automatically fused and GPU-accelerated
mean(y)  % Result computed on GPU

% Your existing MATLAB code just works
A = [1 2 3; 4 5 6; 7 8 9];
B = A' * A;
eigenvals = eig(B);
plot(eigenvals);

% RunMat automatically fuses this chain into a single GPU kernel
% No kernel code, no rewrites—just MATLAB syntax
x = rand(1024, 1, 'single');
y = sin(x) .* x + 0.5;        % Fused: sin, multiply, add
m = mean(y, 'all');            % Reduction stays on GPU
fprintf('m=%.6f\n', double(m)); % Single download at sink

% Simple 2D line plot (works in the pre-release)
x = linspace(0, 2*pi, 1000);
y = sin(x);

plot(x, y);
grid on;
title("Sine wave");

🧱 Architecture: CPU+GPU Performance

RunMate uses a tiered CPU runtime and a fusion engine that automatically chooses CPU or GPU for each part of the math.

Component	Objective	Technology/Notes
⚙️ runmate-ignition	Baseline interpreter for quick startup	HIR → Bytecode Compiler, Stack-Based Interpreter
✓ runmat-turbine	Optimizing JIT for hot code	Cranelift backend, tuned for numerical workloads
🧠 runmate-gc	High-performance memory management	Generational GC with pointer compression
🚀 runmate-accelerate	GPU Acceleration Subsystem	Fusion Engine + Auto-Offload Planner + `wgpu` backend
🔥Fusion Engine	Collapses op chain, chooses CPU vs GPU	Creates op graphs, fuses ops, estimates cost, places tensors on device
🎨 runmat-plot	Plotting Layer (Pre-release)	2D line/scatter plots still work today; 3D, filled shapes and full GPU plotting are on the roadmap
📸 runmate-snapshot	fast startup snapshot	Binary Blob Serialization/Restore
🧰 runmat-runtime	Core Runtime + 200+ built-in functions	BLAS/LAPACK integration and other CPU/GPU-accelerated operations

tier cpu performance Delivers quick startup and strong single-machine performance.
fusion engine Eliminates most of the manual device management and kernel tuning.
gpu backend Runs on NVIDIA, AMD, Apple Silicon, and Intel via Metal/DirectX12/Vulkan, with no vendor lock-in.

🚀 GPU acceleration: fusion and auto-offload

RunMat automatically accelerates your MATLAB code on the GPU without the need for kernel code or rewriting. This system works in four stages:

RunMate creates an “acceleration graph” that captures the intent of your operation—size, operation ranges, dependencies, and constants. This graph provides an overall view of your script’s calculations.

2. Decide what should run on the GPU

The Fusion Engine detects long chains of element-wise operations and linked reductions, planning their execution as a combined GPU program. The auto-offload planner estimates break-even points and works out routes intelligently:

fusion detection:Combines multiple operations into a single GPU dispatch
auto-offload estimation: Considers element count, size reduction and matrix multiplication saturation
habitat awareness: places the tensor on the device where it is worth

RunMat generates portable WGSL (WebGPU Shading Language) kernels that work on all platforms:

Metal on macOS
directx 12 on windows
Vulcan on linux

The kernel is compiled once and cached for subsequent runs, eliminating recompilation overhead.

The runtime minimizes host↔device transfers by:

Uploading tensors once and keeping them resident
Executing fused kernels directly on GPU memory
Collecting results only when needed (eg, for). fprintf or display)

Example: Automatic GPU Fusion

% This code automatically fuses into a single GPU kernel
x = rand(1024, 1, 'single');
y = sin(x) .* x + 0.5;  % Fused: sin, multiply, add
m = mean(y, 'all');      % Reduction stays on GPU
fprintf('m=%.6f\n', double(m));  % Single download at sink

RunMat traverses the array elementwise (sin, .*, +), fuses them into a single GPU dispatch, puts y Resident on GPU, and download only m When required for output.

For more details, see Introduction to RunMate GPUs and How RunMate Fusion Works.

🎨Modern developer experience

REPL full of intelligent features

runmat> .info
🦀 RunMat v0.1.0 - High-Performance MATLAB Runtime
⚡ JIT: Cranelift (optimization: speed)
🧠 GC: Generational (heap: 45MB, collections: 12)
🚀 GPU: wgpu provider (Metal/DX12/Vulkan)
🎨 Plotting: GPU-accelerated (wgpu)
📊 Functions loaded: 200+ builtins + 0 user-defined

runmat> .stats
Execution Statistics:
  Total: 2, JIT: 0, Interpreter: 2
  Average time: 0.12ms

runmat> accel-info
GPU Acceleration Provider: wgpu
Device: Apple M2 Max
Backend: Metal
Fusion pipeline cache: 45 hits, 2 misses

First class Jupyter support

Rich output formatting with LaTeX math rendering
Interactive widget for parameter exploration
Full debugging support with breakpoints

// Adding a new builtin function is trivial
#[runtime_builtin("myfunction")]
fn my_custom_function(x: f64, y: f64) -> f64 x

RunMate includes a comprehensive CLI with powerful features:

# Check GPU acceleration status
runmat accel-info

# Benchmark a script
runmat benchmark my_script.m --iterations 5 --jit

# Create a snapshot for faster startup
runmat snapshot create -o stdlib.snapshot

# GC statistics and control
runmat gc stats
runmat gc major

# System information
runmat info

See the CLI documentation for complete command reference.

RunMAT’s package enables both system programmers and MATLAB users to extend runtime. The core remains lean while the packages provide domain-specific functionality.

High-performance built-ins implemented in Rust:

#[runtime_builtin(
    name = "norm2",
    category = "math/linalg",
    summary = "Euclidean norm of a vector.",
    examples = "n = norm2([3,4])  % 5"
)]
fn norm2_builtin(a: Value) -> Result<Value, String> x

Basic packages get type-safe conversion, deterministic error ID, and zero-cost documentation generation.

The MATLAB source packages compile to RunMat bytecode:

% +mypackage/norm2.m
function n = norm2(v)
    n = sqrt(sum(v .^ 2));
end

Both package types appear identical to users – functions appear in namespaces, reference documentation, and tooling (help, search, document indexing).

# Declare dependencies in .runmat
[packages]
linalg-plus = { source = "registry", version = "^1.2" }
viz-tools = { source = "git", url = "https://github.com/acme/viz-tools" }

# Install packages
runmat pkg install

# Publish your package
runmat pkg publish

Note: The Package Manager CLI is currently in beta. See the package manager documentation for design details.

runmate follows one Minimal core, fast runtime, open extension model Visit:

full language support:Core implements the entire MATLAB grammar and semantics, not a subset
comprehensive built-in: The standard library aims for full base MATLAB built-in coverage (200+ functions)
level performance: Ignition interpreter for fast startup, Turbine JIT for hot code
gpu-first math: Fusion Engine automatically transforms MATLAB code into faster GPU workloads
Small, portable runtime: Single static binary, fast startup, modern CLI, Jupyter kernel support
Toolbox as a package: signal processing, statistics, image processing and other domains live as packages

A modern, high-performance runtime for MATLAB code
A minimal core with a thriving package ecosystem
GPU-accelerated by default with intelligent CPU/GPU routing
Open source and free forever

Re-implementation of MATLAB-in-full (toolbox packages are)
A compatibility layer (we enforce semantics, not folklore)
An IDE (use any editor: Cursor, VSCode, IntelliJ, etc.)

RunMate keeps the core small and uncompromisingly high quality; Everything else is a package. it enables:

Fast iteration without destabilizing the runtime
Domain experts shipping features without forking
A small trusted compute base, easy auditing
Community-driven package ecosystem

See Design philosophy for complete design rationale.

RunMat is designed for array-heavy mathematics in many domains.

Example:

Imaging/Geospatial
4K+ tiles, normalization, radiometric correction, QC metrics

volume/simulation
Monte Carlo Risk, Scenario Analysis, Covariance, Factor Models

signal processing/control
Filters, NLMS, large time-series jobs

researchers and students
MATLAB background, needed to run fast on laptop or cluster

If you write math in MATLAB and reach performance walls on a CPU, RunMat is made for you.

RunMate is more than just software—it’s a movement Open, fast and accessible scientific computingWe are building the future of numerical programming, and we need your help,

🛠️ How to Contribute

🚀For Rust developers

Implement new built-in functions
Optimize JIT compiler
upgrade garbage collector
create developer tooling

Contribution Code →

🔬For domain experts

add mathematical function
write comprehensive tests
create benchmark

Join the discussion →

📚For everyone else

Report bugs and feature requests
improve documentation
Create Tutorials and Examples
spread the word

Get started →

RunMat is licensed under MIT License with attribution requirementsThis means:

free for all – Individuals, Academics, Most Companies
forever open source – No vendor lock-in or license fees
commercial use permission – Embed your products freely
Attribution required – Credit “Runmate by Dystr” in public distribution.
special provisions – Large scientific software companies must keep modifications open source

See LICENSE.md for complete terms or visit runmat.org/license for FAQs.

Dystr Inc. And built with ❤️ by the RunMat community

Star us on GitHub If RunMat is useful for you.

🚀Get started , Follow @dystr

MATLAB® The MathWorks, Inc. is a registered trademark of. RunMate The MathWorks, Inc. Not affiliated with, endorsed or sponsored by.

<a href

runmat-org/runmat: RunMat is a fast runtime designed to run math workloads. Run MATLAB/Octave code blazing fast with cross-platform GPU support across Mac, Windows, Linux, NVIDIA / CUDA, ARM and more

runmate automatically fuse operations and route intelligently between CPU and GPUMatlab syntax, No kernel code, no rewriting,

Status: Pre-release (v0.2)

📊Performance highlights

4K Image Pipeline Full Sweep (B = Image Batch Size)

Monte Carlo Perf Sweep (M = path)

Element Wise Mathematics Complete Sweep (Marks)

🧱 Architecture: CPU+GPU Performance

🚀 GPU acceleration: fusion and auto-offload

2. Decide what should run on the GPU

Example: Automatic GPU Fusion

🎨Modern developer experience

REPL full of intelligent features

First class Jupyter support

🛠️ How to Contribute

Like this:

Related

Leave a Comment Cancel reply

runmate automatically fuse operations and route intelligently between CPU and GPUMatlab syntax, No kernel code, no rewriting,

Status: Pre-release (v0.2)

📊Performance highlights

4K Image Pipeline Full Sweep (B = Image Batch Size)

Monte Carlo Perf Sweep (M = path)

Element Wise Mathematics Complete Sweep (Marks)

🧱 Architecture: CPU+GPU Performance

🚀 GPU acceleration: fusion and auto-offload

2. Decide what should run on the GPU

Example: Automatic GPU Fusion

🎨Modern developer experience

REPL full of intelligent features

First class Jupyter support

🛠️ How to Contribute

Share this:

Like this:

Related

Leave a Comment Cancel reply