GitHub - Deepreinforce-ai/Ornith-1 · GitHub

Ornith Blog

Aloha! 🌺Ornith-1.0 is a self-improving open-source model for agentic coding.

Main characteristics:

cutting edge coding agent: Available in 9b-dense, 31b-dense, 35b-moe and 397b-moe (trained on top of Gemma 4 and Queue 3.5), achieving state-of-the-art performance among comparably sized open-source models on coding benchmarks such as Terminal-Bench 2.1, SWE-Bench, NL2Repo, and OpenClaw.
Self-Improvement Training Framework:Ornith-1.0 uses RL to learn not only how to generate solution rollouts, but also the scaffolds that drive those rollouts. By jointly optimizing the scaffold and the resulting solution, the model discovers better search trajectories and generates higher quality solutions.
license: MIT licensed, globally accessible and free of regional limitations.

Ornith 397B benchmark results

Each model is evaluated against its size-appropriate baselines. All three use the same harness and decoding setup (see notes below table).

	ornith-1.0-9b	QUEEN3.5-9B	QUEEN3.5-35B	Gemma4-12B	Gemma4-31B
agentic coding
terminal-bench 2.1 _(Terminus-2)	43.1	21.3	41.4	21	42.1
terminal-bench 2.1 _{(cloud code)}	40.6	18.9	38.9	–	–
SWE-Bench Verified	69.4	53.2	70	44.2	52
SWE-Bench Pro	42.9	31.3	44.6	27.6	35.7
SWE-Bench Multilingual	52	39.7	60.3	32.5	51.7
NL2Repo	27.2	16.2	20.5	10.3	15.5
claw-level average	63.1	53.2	65.4	32.5	48.5
SWE Atlas – QnA	17.9	9.2	13.2	–	–
SWE Atlas – RF	16.6	4.3	10.2	–	–
SWE Atlas – TW	15.3	4.4	9.8	–	–

	Ornith-1.0-35B	QUEEN3.5-35B	QUEEN3.6-35B	Gemma4-31B	QUEEN3.5-397B
agentic coding
terminal-bench 2.1 _(Terminus-2)	64.2	41.4	52.5	42.1	53.5
terminal-bench 2.1 _{(cloud code)}	62.8	38.9	49.2	–	48.6
SWE-Bench Verified	75.6	70	73.4	52	76.4
SWE-Bench Pro	50.4	44.6	49.5	35.7	51.6
SWE-Bench Multilingual	69.3	60.3	67.2	51.7	69.3
NL2Repo	34.6	20.5	29.4	15.5	36.8
claw-level average	69.8	65.4	68.7	48.5	70.7
SWE Atlas – QnA	37.1	13.2	15.5	–	20.4
SWE Atlas – RF	29.7	10.2	11.4	–	18.4
SWE Atlas – TW	27.8	9.8	13.3	–	18.5

	ornith-1.0-397b	QUEEN3.5-397B	Quen3.7-Max	GLM-5.2-744B	Minimax-M3-428B	deepseek-v4-pro-1.6t	cloud opus 4.7	cloud opus 4.8
agentic coding
terminal-bench 2.1 _(Terminus-2)	77.5	53.5	73.5	81.0	64	64	70.3	85
terminal-bench 2.1 _{(cloud code)}	78.2	48.6	69.8	82.7	–	66.5	69.7	78.9
SWE-Bench Verified	82.4	76.4	80.4	–	–	80.6	80.8	87.6
SWE-Bench Pro	62.2	51.6	60.6	62.1	59	55.4	64.3	69.2
SWE-Bench Multilingual	78.9	69.3	78.3	–	–	76.2	–	–
NL2Repo	48.2	36.8	47.2	48.9	42.1	–	–	69.7
claw-level average	77.1	70.7	65.2	–	–	75.8	78.2	–
SWE Atlas – QnA	41.2	20.4	–	–	37.9	27.2	40.3	48.8
SWE Atlas – RF	42.6	18.4	–	–	–	–	48.6	46.7
SWE Atlas – TW	39.1	18.5	–	–	30.8	–	38.5	–

* Terminal-Bench 2.1 (Terminus-2): Harbor/Terminus-2 framework, parser=json, temp=1.0, top_p=1.0, evaluated with 128K reference window. Each run uses a timeout of 4 hours with 32 CPU cores and 48 GB RAM, averaged over 5 runs. We adjusted the QuenChat template to keep training and inference consistent and modified the harbor to align with VLLM’s reasoning_content key.
* Terminal-Bench 2.1 (Cloud Code): Evaluated with Cloud Code 2.1.126, parser=json, temp=1.0, top_p=1.0, max_new_tokens=131072, average of 5 runs (QWEN chat template modified similarly).
* SWE-Bench Verified / Pro / Multilingual: OpenHand Harness, temp = 1.0, top_p = 0.95, 256K reference window.
* SWE Atlas QnA/RF/TW: Mini-SWE-Agent harness, temp = 1.0, top_p = 0.95, 128K reference window, averaged over 5 runs.
* NL2Repo: temp=1.0, top_p=1.0, 400K reference, 48K output, anti-hacking filter.
* ClawEval: An agentic code benchmark on real-user task distribution; temp = 0.6, 256K references.

Comment

Ornith-1.0 there is one logic model: opens with auxiliary turn a by default … Block before final reply. The serving recipe below enables a logic parser to return the thought-chain in a different form reasoning_content fields, and a tool-call parser so that the model Blocks come out in OpenAI-style form tool_calls.

Ornith-1.0 requires a recent runtime to serve:

transformer ≥ 5.8.1

VLLM ≥ 0.19.1

sglang ≥ 0.5.9

Recommended Sample Parameters: temperature=0.6, top_p=0.95, top_k=20 (Use temperature=1.0 To reproduce the reported benchmark setup).

Ornith-1.0 ships as dense 9b model plus two mix of experts Model (35b, 397b). All checkpoints display the same OpenAI-compliant interface and support a 256K (262,144-token) reference window; Dense 9B fits on a single 80 GB GPU, while MOE checkpoints are split across multi-GPU nodes with tensor parallelism. Each shape is published in multiple precision/format variants:

The recipe below sets up an OpenAI-compliant server under a shared alias. Ornith-1.0. set MODEL to your desired checkpoint, and match --tensor-parallel-size / --tp For your GPU count.

# Pick a checkpoint — dense 9B, or MoE 35B / 397B (append -FP8 for lower-VRAM serving):
MODEL=deepreinforce-ai/Ornith-1.0-397B

# MoE checkpoints (35B / 397B): shard across the node with tensor parallelism.
# Dense checkpoint (9B): fits on a single 80GB GPU — drop --tensor-parallel-size.
vllm serve $MODEL \
    --served-model-name Ornith-1.0 \
    --tensor-parallel-size 8 \
    --host 0.0.0.0 --port 8000 \
    --max-model-len 262144 \
    --gpu-memory-utilization 0.90 \
    --enable-prefix-caching \
    --enable-auto-tool-choice --tool-call-parser qwen3_xml \
    --reasoning-parser qwen3 \
    --trust-remote-code

# Pick a checkpoint — dense 9B, or MoE 35B / 397B (append -FP8 for lower-VRAM serving):
MODEL=deepreinforce-ai/Ornith-1.0-397B

# MoE checkpoints (35B / 397B): shard with --tp ; dense 9B: drop --tp for a single GPU.
python -m sglang.launch_server \
    --model-path $MODEL \
    --served-model-name Ornith-1.0 \
    --tp 8 \
    --host 0.0.0.0 --port 8000 \
    --context-length 262144 \
    --mem-fraction-static 0.85 \
    --tool-call-parser qwen3_coder \
    --reasoning-parser qwen3

Hugging Face Transformer

For quick local testing (or to script offline generation), load the model directly with Transformer. Make sure you have a recent release installed – see the Transformer installation guide; requires ornith-1.0 transformers >= 5.8.1. The dense 9B checkpoint is easiest to run locally.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "deepreinforce-ai/Ornith-1.0-9B"  # or -35B / -397B

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Write a Python function is_prime(n). Keep it short."}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = tokenizer(text, return_tensors="pt").to(model.device)
generated = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.6,
    top_p=0.95,
    top_k=20,
)
output_ids = generated[0][inputs.input_ids.shape[1]:]

# The reply contains a  ...  reasoning block followed by the answer.
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print(content)

To split the logic trace from the last answer, parse at marker:



text = tokenizer.decode(output_ids, skip_special_tokens=True)
if "" in text:
    reasoning, answer = text.split("", 1)
    reasoning = reasoning.replace("", "").strip()
    answer = answer.strip()
else:
    reasoning, answer = "", text.strip()


Using Ornith-1.0 via the Chat Completions API

Once a vLLM or SGLang server is running, talk to it with any OpenAI-compatible client.

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="EMPTY",  # any non-empty string works for a local server
)

response = client.chat.completions.create(
    model="Ornith-1.0",
    messages=[
        {"role": "user", "content": "Write a one-line Python lambda that squares a number."}
    ],
    temperature=0.6,
    top_p=0.95,
    max_tokens=1024,
)

message = response.choices[0].message
# reasoning_content holds the  trace; content holds the final answer.
print("reasoning:", getattr(message, "reasoning_content", None))
print("answer:", message.content)

You can also stream tokens, or hand the model tools — Ornith-1.0 emits well-formed function calls that the server parses into the standard tool_calls Field:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"],
            },
        },
    }
]

response = client.chat.completions.create(
    model="Ornith-1.0",
    messages=[{"role": "user", "content": "What is the weather in Paris right now?"}],
    tools=tools,
    tool_choice="auto",
    temperature=0.6,
    max_tokens=2048,
)

tool_call = response.choices[0].message.tool_calls[0]
print(tool_call.function.name, tool_call.function.arguments)
# -> get_weather {"city": "Paris"}

You can point to any OpenAI-compatible SDK (Python, Node.js, etc.) or curl in the same /v1/chat/completions endpoint.
Ornith-1.0 excels in tool-calling and agentive coding capabilities.
Because Ornith-1.0 exposes OpenAI-compliant endpoints with tool calling, it works out of the box with standard agent frameworks. Below is a minimal example that connects Ornith-1.0 to the tool via an MCP server.

import os
from openai import OpenAI

client = OpenAI(
    base_url=os.getenv("OPENAI_BASE_URL", "http://localhost:8000/v1"),
    api_key=os.getenv("OPENAI_API_KEY", "EMPTY"),
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "run_shell",
            "description": "Run a shell command and return its output.",
            "parameters": {
                "type": "object",
                "properties": {
                    "command": {"type": "string", "description": "The command to run"}
                },
                "required": ["command"],
            },
        },
    }
]

messages = [{"role": "user", "content": "List the Python files in the current directory."}]

response = client.chat.completions.create(
    model="Ornith-1.0",
    messages=messages,
    tools=tools,
    temperature=0.6,
    top_p=0.95,
)
print(response.choices[0].message)

Examples of using Ornith with Agent Harness:

# Hermes talks to any OpenAI-compatible endpoint — point it at your Ornith server.
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"
export MODEL="Ornith-1.0"


pip install openhands-ai

# OpenHands routes through LiteLLM; the "openai/" prefix selects the OpenAI-compatible path.
export LLM_MODEL="openai/Ornith-1.0"
export LLM_BASE_URL="http://localhost:8000/v1"
export LLM_API_KEY="EMPTY"

# Launch the CLI (or run the official OpenHands Docker image with the same env vars).
openhands


# Both runtimes load a GGUF build — available for the 9B and 35B checkpoints (swap -9B for -35B).

# llama.cpp — serve an OpenAI-compatible API on port 8000.
llama-server -hf deepreinforce-ai/Ornith-1.0-9B-GGUF --port 8000 -c 262144

# Ollama — pull and chat with the same GGUF straight from Hugging Face.
ollama run hf.co/deepreinforce-ai/Ornith-1.0-9B-GGUF


pip install unsloth

# Load Ornith for fast local inference or fine-tuning (Python):
#   from unsloth import FastLanguageModel
#   model, tokenizer = FastLanguageModel.from_pretrained(
#       "deepreinforce-ai/Ornith-1.0-9B",
#       max_seq_length=262144,
#       load_in_4bit=True,
#   )


# OpenClaw talks to any OpenAI-compatible endpoint — point it at your Ornith server.
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"
export OPENAI_MODEL="Ornith-1.0"

Ornith-1.0 is optimized for terminal-based coding agents. Point any OpenAI-compatible coding CLI to your Ornith-1.0 endpoint (set) OPENAI_BASE_URL And OPENAI_API_KEY) to understand larger codebases, automate difficult tasks, and ship faster.

# Register your local Ornith endpoint as a provider in ~/.config/opencode/opencode.json:
#
# {
#   "$schema": "https://opencode.ai/config.json",
#   "provider": {
#     "ornith": {
#       "npm": "@ai-sdk/openai-compatible",
#       "name": "Ornith (local)",
#       "options": { "baseURL": "http://localhost:8000/v1", "apiKey": "EMPTY" },
#       "models": { "Ornith-1.0": { "name": "Ornith-1.0" } }
#     }
#   }
# }

opencode

If you find our work helpful, feel free to give us a quote.

@misc{ornith-1.0,
    title = {{Ornith-1.0}: Agentic Coding, Open to All},
    url = {https://deep-reinforce.com/ornith_1_0.html},
    author = {{DeepReinforce Team}},
    year = {2026}
}





<a href
Share this:

				Share on Facebook (Opens in new window)
				Facebook
			

				Share on X (Opens in new window)
				X
			
Like this:
Like Loading...


	Related

GitHub – deepreinforce-ai/Ornith-1 · GitHub

Hugging Face Transformer

Using Ornith-1.0 via the Chat Completions API

Like this:

Related

Leave a Comment Cancel reply