Codex SQLite feedback logs can write ~640 TB/year and rapidly consume SSD endurance · Issue #28224 · openai/codex · GitHub

Issue

The codec is constantly writing large amounts of data to the local SQLite feedback log database:

  • ~/.codex/logs_2.sqlite
  • ~/.codex/logs_2.sqlite-wal
  • ~/.codex/logs_2.sqlite-shm

On my machine, after this 21 days uptimeMainly written about SSD 37 tb. Process/file-level investigation shows that codecs are the main persistent writers of SQLite logs.

This extrapolates roughly 640 TB/year. on one 1 TB SSDthat’s about it 640 full-drive rights per year. Some consumer SSDs are rated around 600 TBWTherefore it can exhaust almost the entire drive’s required write endurance in less than a year.

Proof

currently retained rows logs_2.sqlite: :

metric price
Rows retained 681,774
estimated archived log content 1,035.6 MiB

Level Distribution:

level Estimated MIB byte%
trace 732.5 70.7%
Information 266.5 25.7%
debug 30.6 3.0%
warn 5.9 0.6%

Biggest target+level combination:

Target level Estimated MIB
codex_api::endpoint::responses_websocket trace 527.4
codex_otel.log_only Information 141.2
codex_otel.trace_safe Information 121.2
log trace 97.4
codex_client::transport trace 60.1
codex_core::stream_events_utils debug 27.5
codex_api::sse::responses trace 19.1

The top sources are mostly global TRACE logs, mirrored telemetry logs, and raw websocket/SSE payload logging. TRACE is about alone 70.7% of intact bytes. codex_otel.log_only + codex_otel.trace_safe add another 25.3%. These categories should be filtered and removed broadly 96% The number of log bytes created in this sample without completely disabling the feedback log.

Cleaned examples from the most frequently viewed TRACE sources: target=log

These are high frequency created samples. Raw WebSocket/SSE payload bodies are intentionally not included as they may contain private conversation content.

128,764x TRACE log: inotify event: ... mask: OPEN, name: Some("ld.so.cache")
 37,982x TRACE log: inotify event: ... mask: OPEN, name: Some("locale.alias")
 23,843x TRACE log: inotify event: ... mask: OPEN, name: Some("passwd")
  3,639x TRACE log: /src/compat.rs:131 AllowStd.with_context
  3,505x TRACE log: /src/lib.rs:245 WebSocketStream.with_context
  3,362x TRACE log: /src/compat.rs:154 Read.read
  3,356x TRACE log: /src/compat.rs:157 Read.with_context read -> poll_read
  3,230x TRACE log: /src/lib.rs:294 Stream.poll_next
  3,227x TRACE log: /src/lib.rs:304 Stream.with_context poll_next -> read()
  3,213x TRACE log: inotify event: ... mask: OPEN, name: Some("nsswitch.conf")
  2,001x TRACE log: WouldBlock
  1,217x TRACE log: Masked: false
  1,169x TRACE log: Opcode: Data(Text)
  1,169x TRACE log: First: 11000001
Clean examples from frequent information sources

The major information sources are mostly frequently repeated OpenTelemetry mirror events. The ID has been modified.

843x INFO codex_client::custom_ca:
  using system root certificates because no CA override environment variable was selected ...

334x INFO codex_otel.trace_safe:
  session_loop{thread_id=}:submission_dispatch{otel.name="op.dispatch.user_input" submission.id= codex.op="user_input"}:turn{otel.name="session_task.turn" thread.id= ...}

333x INFO codex_otel.log_only:
  session_loop{thread_id=}:submission_dispatch{otel.name="op.dispatch.user_input" submission.id= codex.op="user_input"}:turn{otel.name="session_task.turn" thread.id= ...}

332x INFO codex_otel.log_only:
  session_loop{thread_id=}:submission_dispatch{otel.name="op.dispatch.user_input_with_turn_context" submission.id= codex.op="user_input_with_turn_context"}:turn{otel.name="session_task.turn" thread.id= ...}

332x INFO codex_otel.trace_safe:
  session_loop{thread_id=}:submission_dispatch{otel.name="op.dispatch.user_input_with_turn_context" submission.id= codex.op="user_input_with_turn_context"}:turn{otel.name="session_task.turn" thread.id= ...}

write amplification

The retained DB size hides the actual write volume. In a 15 second sample:

metric First after
Rows retained 681,774 681,774
max row id 5,003,347,015 5,003,383,226

About this 36,211 rows inserted in 15 secondsWhile the retained row number remained constant. This suggests continuous insert-and-sort write amplification: rows are inserted, indexed, written to the wall, then truncated.

possible cause

SQLite feedback log sync is installed with global TRACE default:

Targets::new().with_default(Level::TRACE)

It maintains all targets at the TRACE level by default, including dependency/internal logs and large raw protocol payloads.

proposed solution

Keep the feedback log enabled, but limit what persists by default:

  1. Do not use global TRACE for SQLite feedback log sync.
  2. Drop or raise thresholds specifically for low-value dependence noise target=log, hyper_utilTokyo-Tungstenite internals, Inotify spam, and low-level OpenTelemetry SDK logs.
  3. Avoid persisting full raw WebSocket/SSE payloads by default. Instead store the summary: event type, duration, success/error, token usage, and payload byte length.
  4. avoid constantly reflecting codex_otel.log_only / codex_otel.trace_safe Events unless they are clearly useful for debugging feedback.
  5. Add a global log DB size/write cap. Per-thread caps are not sufficient when multiple threads/processes exist.

An optional escape hatch e.g. sqlite_logs_enabled = false It would still be useful, but the main improvement should be better default filtering.

Related issues and discussions



<a href

Leave a Comment