Commit graph

48 commits

Author SHA1 Message Date
90dcf6a2f1 fix: disable _apply_fidelity again — stubs cause rapid retry loops
Second time disabling. The auto-stub replacement produces content
that opencode rejects, triggering 16+ retries per second. Needs
deeper investigation into which content types can be safely stubbed
vs which ones break the model.
2026-03-19 08:02:26 -06:00
f529816c13 fix: remove blocking rate_limiter.acquire() from request paths
The rate limiter's time.sleep() blocked the single uvicorn worker
thread, deadlocking the entire server (health endpoint, dashboard,
all requests). Removed acquire() from both streaming and non-streaming
paths. The rate limiter still records 429s for circuit breaker stats
but no longer blocks.
2026-03-19 07:50:20 -06:00
8d79101025 perf: move entropy faulting to background, use json copy instead of deepcopy
Remaining hot-path blockers for large contexts:
- Entropy detection: per-object get() loop + micro-fault LLM calls → bg thread
- copy.deepcopy(messages): O(n^2) identity tracking → json round-trip (3-5x faster)
- Added _preprocess timing log (warns if >500ms)
2026-03-19 07:34:05 -06:00
a0822b0f6d perf: move all slow operations to background threads
Blocking operations in the request path caused SSE timeouts:
- Goal classification (~4.3s LLM call) → background thread
- Hierarchy rebuild (loads all objects) → background thread
- Per-segment find_duplicate (async per-object) → skipped
- Per-object goal relevance loop → disabled

Request path now only does: ingest, segment, admission (fast),
then forwards to Anthropic immediately.
2026-03-19 07:24:35 -06:00
2c14ec09b2 feat: add Claude Code identity transforms for standalone OAuth usage
When proxying with OAuth (no auth plugin), applies the same body
transforms as opencode-anthropic-auth:
- Prepend 'You are Claude Code' identity to system prompt
- Replace 'OpenCode' with 'Claude Code' in system text
- Prefix tool names with 'mcp_' in tools and tool_use blocks
These are only applied when ANTHROPIC_AUTH_TOKEN is set.
2026-03-19 07:19:25 -06:00
63307c70be fix: update OAuth headers and betas to match Claude CLI 2.1.75
Updated from the rmk40/opencode-anthropic-auth plugin profile.
Adds all required beta headers (claude-code, context-1m, interleaved
thinking, redact-thinking, prompt-caching-scope, advanced-tool-use,
effort), opus-specific context-management beta, ?beta=true URL param,
and correct user-agent/x-app headers.
2026-03-19 07:07:48 -06:00
142ef4d6c3 revert: switch helper LLM back to haiku 4.5 2026-03-19 07:02:25 -06:00
4ca9c58920 feat: detect context window size dynamically from model ID
Hardcoded 200K window caused 101% pressure at 201K tokens on 1M
models. Now detects model from request payload and sets window_size
accordingly (1M for opus-4-6/sonnet-4-6/sonnet-4-5, 200K for others).
Falls back to 200K for unknown models.
2026-03-15 19:36:05 -06:00
4f50359d01 perf: batch embeddings in background thread to fix SSE timeouts
Root cause: 306 embedding calls at 61ms each blocked the request
thread for ~19s before forwarding to Anthropic.

- Batch all admitted objects into single embed_batch() call
- Run in background thread (non-blocking)
- store_object accepts pre-computed embeddings
- Goal detection uses turn heuristic instead of blocking embed
2026-03-15 19:29:23 -06:00
0135298966 fix: emit SSE keepalive comments while buffering cleanup tags
When the SSE filter suppresses text deltas (buffering inside a
memory_cleanup/yuyay-response tag), no bytes reached the client,
causing opencode's SSE read timeout to fire. Now emits ':keepalive'
SSE comments during suppression to keep the connection alive.
2026-03-15 19:18:36 -06:00
3a970efd62 fix: increase httpx read timeout to 600s for large context SSE streams
With 200k+ token contexts, Anthropic can take 60+ seconds for time
to first token. The 300s timeout was too aggressive for SSE reads
during long thinking phases.
2026-03-15 18:55:08 -06:00
c87ce03c67 fix: switch helper LLM to claude-sonnet-4-6 for 1M context support
Haiku 4.5 has a 200K context limit which causes SSE errors when
sessions grow large. Sonnet 4.6 supports 1M tokens at $3/$15 per
million.
2026-03-15 18:44:23 -06:00
5c4d4700b3 feat: outbound rate limiter with circuit breaker for Anthropic API
Token bucket at 40 RPM to stay under Max 5x plan ceilings (~50 RPM).
Reads retry-after header from 429 responses to pause precisely.
Circuit breaker trips after 3 consecutive 429s, pausing 30s before
retrying. Stats exposed in /health endpoint.
2026-03-15 09:45:42 -06:00
ac5e207a73 fix: gradual fidelity degradation with recency guard and better stubs
Root cause of retry loops: mass degradation stubbed 85% of context at
once, confusing the model into infinite retries.

Fixes:
- Cap degradations at 20 per turn (gradual compression)
- Protect objects accessed within last 10 turns from degradation
- Estimate L1/L2 token counts (30%/10% of L0) so FM pressure tracks
  correctly after degradation
- Improved stubs: '[compressed 3.4KB -> stub] first 200 chars...'
- Re-enabled _apply_fidelity
2026-03-15 09:39:42 -06:00
fb7bc87ea7 fix: disable fidelity content replacement to stop rapid retry loops
L1/L2 stub replacement was producing responses that opencode
rejected, triggering rapid retries and rate limiting. Disabled
until stub format is validated for tool_result compatibility.
2026-03-15 09:34:06 -06:00
4a9182d4aa fix: use auto-stub fallback for L1/L2 when LLM summary not available
L1/L2 objects without LLM summaries kept full content as fallback,
increasing context instead of reducing it. Now uses auto-stub
(truncated preview) when no summary exists, ensuring degraded
objects always produce smaller content.
2026-03-15 08:44:26 -06:00
8d8bd03d41 fix: dashboard fidelity chart reads from FidelityManager instead of empty object store
The fidelity distribution chart was always showing 0 for L1-L4
because it queried the ObjectStore (never populated) instead of
the FidelityManager (which actually tracks degradation state).
2026-03-15 08:41:50 -06:00
4ffb93553c fix: keep FM window scaled during degrade() and add /api/fidelity endpoint
Window was restored before degrade() was called, so FM always saw
NORMAL pressure internally. Now keeps scaled window through the
degrade call. Adds /api/fidelity debug endpoint showing FM state,
object counts, pressure ratios, and fidelity distribution per session.
2026-03-15 08:38:18 -06:00
225b9b30f1 fix: scale FM window in _apply_fidelity to match real API pressure
_apply_fidelity checked fm.current_pressure() which uses internal
object tokens (tiny) / 200k window = always NORMAL. Now scales
window_size using last_effective token count so FM pressure matches
real context usage, enabling L0->L1->L2 degradations.
2026-03-15 08:27:05 -06:00
2a71014535 docs: add Claude Code integration instructions to README 2026-03-15 08:00:30 -06:00
5580bc87cc fix: handle yuyar typo variant of yuyay-response tags
Models occasionally misspell yuyay as yuyar since it's a made-up word.
Use [yr] character class in all regexes and partial tag prefixes.
2026-03-14 09:16:47 -06:00
be16715163 docs: add comprehensive README 2026-03-14 09:13:16 -06:00
f5c2c91057 fix: remove orphaned SSE event headers when suppressing text deltas
When the cleanup filter suppresses a text delta (buffering inside a
tag), the preceding 'event: content_block_delta' header was left in
the output, producing malformed SSE that caused opencode to retry
rapidly and freeze. Now removes the event header alongside the data
line.
2026-03-13 21:39:48 -06:00
2c42f9b52a fix: assign conversation turn numbers to blocks and add /api/blocks debug endpoint
Blocks were all getting turn=1 because label_messages used a single
global counter. Now derives turn from message position (each user msg
increments the turn). Also updates turn on already-labeled blocks.
Adds /api/blocks endpoint to inspect BlockStore state per session.
This enables collapse_range(1,72) to correctly target early turns.
2026-03-13 21:35:19 -06:00
e0af1edadf fix: safety valve only flushes partial openers, not real tags
Long cleanup tags (e.g. collapse summaries) can span 30+ SSE deltas.
The safety valve was flushing after 6 deltas regardless, dumping
incomplete tags into the output. Now only flushes when buffering a
partial opener (<m, <y) that never resolved — never when inside a
confirmed tag.
2026-03-13 21:27:08 -06:00
ad2c296ba3 fix: parse XML-format cleanup tags and strip from SSE stream
The model emits cleanup ops as XML elements (<drop>block:x</drop>,
<release handle="x"/>, <collapse>turns N-M "summary"</collapse>)
but the parser only handled prose format (drop: block:x). Add XML
regex matchers alongside the existing prose parser so both formats
are recognized, executed, and stripped from the streaming output.
2026-03-13 21:23:26 -06:00
65e4e38a98 fix: scale FM window_size to match real API pressure for fidelity degradation
The FidelityManager's internal pressure calculation uses its own tracked
object tokens divided by window_size, which is always tiny compared to
the real context. Temporarily scale window_size so the FM's pressure
matches the actual API input_tokens/window ratio, triggering L0→L1→L2
degradations when context exceeds 50%.
2026-03-13 21:13:38 -06:00
92fba55f70 fix: accurate context reduction stats and SSE cleanup tag filter
Measure incoming_bytes before _preprocess() so bytes_saved reflects true
reduction. Add SSECleanupFilter that intercepts memory_cleanup/yuyay-response
tags in streaming responses, strips them from output, and executes ops
(drops, collapses, releases) in real-time. Handles partial tags split across
SSE chunks with a safety valve to flush stale buffers for prose.
2026-03-13 21:07:52 -06:00
2bf6baaa33 fix: rebuild session history for undo and segment only new messages
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-13 15:40:48 -06:00
6719d3f3f0 fix: render collapsed turn summaries in outbound context
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-13 15:40:47 -06:00
235e88d416 fix: route mnemosyne provider instead of anthropic in opencode plugin
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-13 13:38:26 -06:00
5702a5a1e2 fix: wire bytes_saved through benchmark, restore _check_token_cap, apply block cleanup to outbound
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-13 13:37:33 -06:00
fa1f27bad5 fix: remove YuyayStreamFilter so pichay can receive yuyay-response blocks 2026-03-13 12:40:18 -06:00
59cce5c6d2 fix: pass real incoming/outgoing bytes to record_turn for context reduction stats 2026-03-13 12:35:57 -06:00
8df3f4f2b7 feat: add systemd user service for mnemosyne auto-restart 2026-03-13 12:07:17 -06:00
f8f85aea47 fix: inject /v1 suffix in baseURL and skip injection when gateway is down
The Anthropic SDK appends /messages to baseURL, so the gateway
baseURL must include /v1. Also removes the static baseURL from
opencode.json — the plugin now injects it dynamically only when the
gateway health check passes, so requests fall through directly to
api.anthropic.com when Mnemosyne is not running.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-13 11:59:50 -06:00
a822c497c5 fix: correct dashboard JS field mappings for benchmark API
The /api/benchmark response uses total_admitted, total_rejected,
total_attempts etc. but the dashboard JS was reading admitted, rejected,
attempts. Also fixed fidelity bar reading fidelity_distribution key and
the session table to derive context reduction from byte ratio.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-13 11:59:41 -06:00
bee90915db feat: add SSE stream filter for yuyay protocol tags
Strip <yuyay-response>, <yuyay-manifest>, and <yuyay-query> tags from
the SSE stream before forwarding to the client. The cooperative memory
protocol tags are still processed by the gateway on the next inbound
request — they just no longer leak into the user's visible output.

Handles tags spanning across multiple text_delta SSE events.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-13 11:49:12 -06:00
6b9b3df64d fix: disable phantom tool injection when proxying for opencode
opencode validates tool calls against its own registry and rejects
unknown tools like memory_query. The opencode plugin provides
mnemosyne_query/mnemosyne_status as registered MCP tools instead.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-13 11:43:56 -06:00
7c6a3dbe4a docs: add architecture and reference documentation
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-13 11:41:41 -06:00
b21871b8fc feat: add opencode plugin for Mnemosyne routing
TypeScript plugin that injects baseURL to route Anthropic API calls
through the Mnemosyne gateway, enriches compaction with memory context,
and provides mnemosyne_status/mnemosyne_query custom tools.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-13 11:41:36 -06:00
9b25b33a50 test: add gateway integration tests
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-13 11:41:28 -06:00
d660414ad7 feat: add benchmarking, auth, and utility modules
CLI benchmark command, threshold auto-tuning, OAuth PKCE auth
(same flow as Claude Code), cost tracking, telemetry, and replay.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-13 11:41:22 -06:00
681c1454cb feat: add memory management pipeline
Admission control, entropy-based micro-faulting, phantom tool
injection for backing store queries, and xMemory session hierarchy
for long conversations (50+ turns).

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-13 11:41:12 -06:00
a13719f754 feat: add object store with semantic segmentation
Object-addressed memory: segment messages into semantic objects,
embed with sentence-transformers, store in pgvector-backed store,
and reassemble context via goal-aware retrieval.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-13 11:41:04 -06:00
d26c56c2f0 feat: add multi-fidelity compression engine
5-level fidelity manager (L0-Full to L4-Evicted) with helper LLM
(Haiku 4.5) for intelligent summarization during degradation.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-13 11:40:56 -06:00
974863e7b3 feat: add core proxy framework with gateway and providers
Multi-provider HTTP proxy (Anthropic + OpenAI) with session management,
message processing pipeline, block labeling, cache control placement,
and embedded monitoring dashboard.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-13 11:40:49 -06:00
ed0361f97c chore: initialize project scaffold and config
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-13 11:40:35 -06:00