mnemosyne

Author	SHA1	Message	Date
Joey Yakimowich-Payne	90dcf6a2f1	fix: disable _apply_fidelity again — stubs cause rapid retry loops Second time disabling. The auto-stub replacement produces content that opencode rejects, triggering 16+ retries per second. Needs deeper investigation into which content types can be safely stubbed vs which ones break the model.	2026-03-19 08:02:26 -06:00
Joey Yakimowich-Payne	f529816c13	fix: remove blocking rate_limiter.acquire() from request paths The rate limiter's time.sleep() blocked the single uvicorn worker thread, deadlocking the entire server (health endpoint, dashboard, all requests). Removed acquire() from both streaming and non-streaming paths. The rate limiter still records 429s for circuit breaker stats but no longer blocks.	2026-03-19 07:50:20 -06:00
Joey Yakimowich-Payne	8d79101025	perf: move entropy faulting to background, use json copy instead of deepcopy Remaining hot-path blockers for large contexts: - Entropy detection: per-object get() loop + micro-fault LLM calls → bg thread - copy.deepcopy(messages): O(n^2) identity tracking → json round-trip (3-5x faster) - Added _preprocess timing log (warns if >500ms)	2026-03-19 07:34:05 -06:00
Joey Yakimowich-Payne	a0822b0f6d	perf: move all slow operations to background threads Blocking operations in the request path caused SSE timeouts: - Goal classification (~4.3s LLM call) → background thread - Hierarchy rebuild (loads all objects) → background thread - Per-segment find_duplicate (async per-object) → skipped - Per-object goal relevance loop → disabled Request path now only does: ingest, segment, admission (fast), then forwards to Anthropic immediately.	2026-03-19 07:24:35 -06:00
Joey Yakimowich-Payne	2c14ec09b2	feat: add Claude Code identity transforms for standalone OAuth usage When proxying with OAuth (no auth plugin), applies the same body transforms as opencode-anthropic-auth: - Prepend 'You are Claude Code' identity to system prompt - Replace 'OpenCode' with 'Claude Code' in system text - Prefix tool names with 'mcp_' in tools and tool_use blocks These are only applied when ANTHROPIC_AUTH_TOKEN is set.	2026-03-19 07:19:25 -06:00
Joey Yakimowich-Payne	63307c70be	fix: update OAuth headers and betas to match Claude CLI 2.1.75 Updated from the rmk40/opencode-anthropic-auth plugin profile. Adds all required beta headers (claude-code, context-1m, interleaved thinking, redact-thinking, prompt-caching-scope, advanced-tool-use, effort), opus-specific context-management beta, ?beta=true URL param, and correct user-agent/x-app headers.	2026-03-19 07:07:48 -06:00
Joey Yakimowich-Payne	142ef4d6c3	revert: switch helper LLM back to haiku 4.5	2026-03-19 07:02:25 -06:00
Joey Yakimowich-Payne	4ca9c58920	feat: detect context window size dynamically from model ID Hardcoded 200K window caused 101% pressure at 201K tokens on 1M models. Now detects model from request payload and sets window_size accordingly (1M for opus-4-6/sonnet-4-6/sonnet-4-5, 200K for others). Falls back to 200K for unknown models.	2026-03-15 19:36:05 -06:00
Joey Yakimowich-Payne	4f50359d01	perf: batch embeddings in background thread to fix SSE timeouts Root cause: 306 embedding calls at 61ms each blocked the request thread for ~19s before forwarding to Anthropic. - Batch all admitted objects into single embed_batch() call - Run in background thread (non-blocking) - store_object accepts pre-computed embeddings - Goal detection uses turn heuristic instead of blocking embed	2026-03-15 19:29:23 -06:00
Joey Yakimowich-Payne	0135298966	fix: emit SSE keepalive comments while buffering cleanup tags When the SSE filter suppresses text deltas (buffering inside a memory_cleanup/yuyay-response tag), no bytes reached the client, causing opencode's SSE read timeout to fire. Now emits ':keepalive' SSE comments during suppression to keep the connection alive.	2026-03-15 19:18:36 -06:00
Joey Yakimowich-Payne	3a970efd62	fix: increase httpx read timeout to 600s for large context SSE streams With 200k+ token contexts, Anthropic can take 60+ seconds for time to first token. The 300s timeout was too aggressive for SSE reads during long thinking phases.	2026-03-15 18:55:08 -06:00
Joey Yakimowich-Payne	c87ce03c67	fix: switch helper LLM to claude-sonnet-4-6 for 1M context support Haiku 4.5 has a 200K context limit which causes SSE errors when sessions grow large. Sonnet 4.6 supports 1M tokens at $3/$15 per million.	2026-03-15 18:44:23 -06:00
Joey Yakimowich-Payne	5c4d4700b3	feat: outbound rate limiter with circuit breaker for Anthropic API Token bucket at 40 RPM to stay under Max 5x plan ceilings (~50 RPM). Reads retry-after header from 429 responses to pause precisely. Circuit breaker trips after 3 consecutive 429s, pausing 30s before retrying. Stats exposed in /health endpoint.	2026-03-15 09:45:42 -06:00
Joey Yakimowich-Payne	ac5e207a73	fix: gradual fidelity degradation with recency guard and better stubs Root cause of retry loops: mass degradation stubbed 85% of context at once, confusing the model into infinite retries. Fixes: - Cap degradations at 20 per turn (gradual compression) - Protect objects accessed within last 10 turns from degradation - Estimate L1/L2 token counts (30%/10% of L0) so FM pressure tracks correctly after degradation - Improved stubs: '[compressed 3.4KB -> stub] first 200 chars...' - Re-enabled _apply_fidelity	2026-03-15 09:39:42 -06:00
Joey Yakimowich-Payne	fb7bc87ea7	fix: disable fidelity content replacement to stop rapid retry loops L1/L2 stub replacement was producing responses that opencode rejected, triggering rapid retries and rate limiting. Disabled until stub format is validated for tool_result compatibility.	2026-03-15 09:34:06 -06:00
Joey Yakimowich-Payne	4a9182d4aa	fix: use auto-stub fallback for L1/L2 when LLM summary not available L1/L2 objects without LLM summaries kept full content as fallback, increasing context instead of reducing it. Now uses auto-stub (truncated preview) when no summary exists, ensuring degraded objects always produce smaller content.	2026-03-15 08:44:26 -06:00
Joey Yakimowich-Payne	8d8bd03d41	fix: dashboard fidelity chart reads from FidelityManager instead of empty object store The fidelity distribution chart was always showing 0 for L1-L4 because it queried the ObjectStore (never populated) instead of the FidelityManager (which actually tracks degradation state).	2026-03-15 08:41:50 -06:00
Joey Yakimowich-Payne	4ffb93553c	fix: keep FM window scaled during degrade() and add /api/fidelity endpoint Window was restored before degrade() was called, so FM always saw NORMAL pressure internally. Now keeps scaled window through the degrade call. Adds /api/fidelity debug endpoint showing FM state, object counts, pressure ratios, and fidelity distribution per session.	2026-03-15 08:38:18 -06:00
Joey Yakimowich-Payne	225b9b30f1	fix: scale FM window in _apply_fidelity to match real API pressure _apply_fidelity checked fm.current_pressure() which uses internal object tokens (tiny) / 200k window = always NORMAL. Now scales window_size using last_effective token count so FM pressure matches real context usage, enabling L0->L1->L2 degradations.	2026-03-15 08:27:05 -06:00
Joey Yakimowich-Payne	2a71014535	docs: add Claude Code integration instructions to README	2026-03-15 08:00:30 -06:00
Joey Yakimowich-Payne	5580bc87cc	fix: handle yuyar typo variant of yuyay-response tags Models occasionally misspell yuyay as yuyar since it's a made-up word. Use [yr] character class in all regexes and partial tag prefixes.	2026-03-14 09:16:47 -06:00
Joey Yakimowich-Payne	be16715163	docs: add comprehensive README	2026-03-14 09:13:16 -06:00
Joey Yakimowich-Payne	f5c2c91057	fix: remove orphaned SSE event headers when suppressing text deltas When the cleanup filter suppresses a text delta (buffering inside a tag), the preceding 'event: content_block_delta' header was left in the output, producing malformed SSE that caused opencode to retry rapidly and freeze. Now removes the event header alongside the data line.	2026-03-13 21:39:48 -06:00
Joey Yakimowich-Payne	2c42f9b52a	fix: assign conversation turn numbers to blocks and add /api/blocks debug endpoint Blocks were all getting turn=1 because label_messages used a single global counter. Now derives turn from message position (each user msg increments the turn). Also updates turn on already-labeled blocks. Adds /api/blocks endpoint to inspect BlockStore state per session. This enables collapse_range(1,72) to correctly target early turns.	2026-03-13 21:35:19 -06:00
Joey Yakimowich-Payne	e0af1edadf	fix: safety valve only flushes partial openers, not real tags Long cleanup tags (e.g. collapse summaries) can span 30+ SSE deltas. The safety valve was flushing after 6 deltas regardless, dumping incomplete tags into the output. Now only flushes when buffering a partial opener (<m, <y) that never resolved — never when inside a confirmed tag.	2026-03-13 21:27:08 -06:00
Joey Yakimowich-Payne	ad2c296ba3	fix: parse XML-format cleanup tags and strip from SSE stream The model emits cleanup ops as XML elements (<drop>block:x</drop>, <release handle="x"/>, <collapse>turns N-M "summary"</collapse>) but the parser only handled prose format (drop: block:x). Add XML regex matchers alongside the existing prose parser so both formats are recognized, executed, and stripped from the streaming output.	2026-03-13 21:23:26 -06:00
Joey Yakimowich-Payne	65e4e38a98	fix: scale FM window_size to match real API pressure for fidelity degradation The FidelityManager's internal pressure calculation uses its own tracked object tokens divided by window_size, which is always tiny compared to the real context. Temporarily scale window_size so the FM's pressure matches the actual API input_tokens/window ratio, triggering L0→L1→L2 degradations when context exceeds 50%.	2026-03-13 21:13:38 -06:00
Joey Yakimowich-Payne	92fba55f70	fix: accurate context reduction stats and SSE cleanup tag filter Measure incoming_bytes before _preprocess() so bytes_saved reflects true reduction. Add SSECleanupFilter that intercepts memory_cleanup/yuyay-response tags in streaming responses, strips them from output, and executes ops (drops, collapses, releases) in real-time. Handles partial tags split across SSE chunks with a safety valve to flush stale buffers for prose.	2026-03-13 21:07:52 -06:00
Joey Yakimowich-Payne	2bf6baaa33	fix: rebuild session history for undo and segment only new messages Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-13 15:40:48 -06:00
Joey Yakimowich-Payne	6719d3f3f0	fix: render collapsed turn summaries in outbound context Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-13 15:40:47 -06:00
Joey Yakimowich-Payne	235e88d416	fix: route mnemosyne provider instead of anthropic in opencode plugin Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-13 13:38:26 -06:00
Joey Yakimowich-Payne	5702a5a1e2	fix: wire bytes_saved through benchmark, restore _check_token_cap, apply block cleanup to outbound Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-13 13:37:33 -06:00
Joey Yakimowich-Payne	fa1f27bad5	fix: remove YuyayStreamFilter so pichay can receive yuyay-response blocks	2026-03-13 12:40:18 -06:00
Joey Yakimowich-Payne	59cce5c6d2	fix: pass real incoming/outgoing bytes to record_turn for context reduction stats	2026-03-13 12:35:57 -06:00
Joey Yakimowich-Payne	8df3f4f2b7	feat: add systemd user service for mnemosyne auto-restart	2026-03-13 12:07:17 -06:00
Joey Yakimowich-Payne	f8f85aea47	fix: inject /v1 suffix in baseURL and skip injection when gateway is down The Anthropic SDK appends /messages to baseURL, so the gateway baseURL must include /v1. Also removes the static baseURL from opencode.json — the plugin now injects it dynamically only when the gateway health check passes, so requests fall through directly to api.anthropic.com when Mnemosyne is not running. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-13 11:59:50 -06:00
Joey Yakimowich-Payne	a822c497c5	fix: correct dashboard JS field mappings for benchmark API The /api/benchmark response uses total_admitted, total_rejected, total_attempts etc. but the dashboard JS was reading admitted, rejected, attempts. Also fixed fidelity bar reading fidelity_distribution key and the session table to derive context reduction from byte ratio. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-13 11:59:41 -06:00
Joey Yakimowich-Payne	bee90915db	feat: add SSE stream filter for yuyay protocol tags Strip <yuyay-response>, <yuyay-manifest>, and <yuyay-query> tags from the SSE stream before forwarding to the client. The cooperative memory protocol tags are still processed by the gateway on the next inbound request — they just no longer leak into the user's visible output. Handles tags spanning across multiple text_delta SSE events. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-13 11:49:12 -06:00
Joey Yakimowich-Payne	6b9b3df64d	fix: disable phantom tool injection when proxying for opencode opencode validates tool calls against its own registry and rejects unknown tools like memory_query. The opencode plugin provides mnemosyne_query/mnemosyne_status as registered MCP tools instead. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-13 11:43:56 -06:00
Joey Yakimowich-Payne	7c6a3dbe4a	docs: add architecture and reference documentation Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-13 11:41:41 -06:00
Joey Yakimowich-Payne	b21871b8fc	feat: add opencode plugin for Mnemosyne routing TypeScript plugin that injects baseURL to route Anthropic API calls through the Mnemosyne gateway, enriches compaction with memory context, and provides mnemosyne_status/mnemosyne_query custom tools. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-13 11:41:36 -06:00
Joey Yakimowich-Payne	9b25b33a50	test: add gateway integration tests Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-13 11:41:28 -06:00
Joey Yakimowich-Payne	d660414ad7	feat: add benchmarking, auth, and utility modules CLI benchmark command, threshold auto-tuning, OAuth PKCE auth (same flow as Claude Code), cost tracking, telemetry, and replay. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-13 11:41:22 -06:00
Joey Yakimowich-Payne	681c1454cb	feat: add memory management pipeline Admission control, entropy-based micro-faulting, phantom tool injection for backing store queries, and xMemory session hierarchy for long conversations (50+ turns). Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-13 11:41:12 -06:00
Joey Yakimowich-Payne	a13719f754	feat: add object store with semantic segmentation Object-addressed memory: segment messages into semantic objects, embed with sentence-transformers, store in pgvector-backed store, and reassemble context via goal-aware retrieval. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-13 11:41:04 -06:00
Joey Yakimowich-Payne	d26c56c2f0	feat: add multi-fidelity compression engine 5-level fidelity manager (L0-Full to L4-Evicted) with helper LLM (Haiku 4.5) for intelligent summarization during degradation. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-13 11:40:56 -06:00
Joey Yakimowich-Payne	974863e7b3	feat: add core proxy framework with gateway and providers Multi-provider HTTP proxy (Anthropic + OpenAI) with session management, message processing pipeline, block labeling, cache control placement, and embedded monitoring dashboard. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-13 11:40:49 -06:00
Joey Yakimowich-Payne	ed0361f97c	chore: initialize project scaffold and config Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-13 11:40:35 -06:00

48 commits