Second time disabling. The auto-stub replacement produces content that opencode rejects, triggering 16+ retries per second. Needs deeper investigation into which content types can be safely stubbed vs which ones break the model. |
||
|---|---|---|
| deploy | ||
| opencode-plugin | ||
| sql | ||
| src/mnemosyne | ||
| tests | ||
| .gitignore | ||
| .python-version | ||
| ARCHITECTURE.md | ||
| docker-compose.yml | ||
| mnemosyne.toml | ||
| pyproject.toml | ||
| README.md | ||
| REFERENCES.md | ||
| ROADMAP.md | ||
| SCHEMA.md | ||
Mnemosyne
Mnemosyne is a transparent HTTP proxy designed to manage LLM context memory. It sits between LLM clients and providers like Anthropic or OpenAI, providing semantic segmentation, multi-fidelity compression, and live monitoring to reduce context costs and improve agent performance.
Overview
Mnemosyne acts as a drop-in proxy. It intercepts API requests, segments content into labeled blocks, and applies multi-fidelity compression to manage the context window. As context grows, Mnemosyne automatically degrades content fidelity to keep the most relevant information available while minimizing token usage.
Architecture
Client → Mnemosyne Proxy → Provider API
|
+------+------+
| |
BlockStore PageStore
| |
FidelityManager |
| |
SSECleanupFilter |
| |
ObjectStore <-----+
Features
- Transparent Proxy: Operates as a drop-in replacement for LLM API endpoints. No client-side changes are required.
- Semantic Segmentation: Automatically labels conversation content with tensor/block IDs, allowing the model to reference specific segments.
- Multi-Fidelity Compression: Supports five levels of fidelity (L0-L4):
- L0: Full content (no compression)
- L1: Detailed summary (~30% size)
- L2: Compact summary (~5% size)
- L3: Metadata stub (~50-100 tokens)
- L4: Evicted (not in context)
- Pressure-Based Degradation: Automatically downgrades object fidelity based on context window pressure zones (Normal, Caution, Warning, Critical, Emergency).
- SSE Cleanup Filter: Intercepts streaming responses to strip internal memory management tags (e.g.,
<memory_cleanup>,<yuyay-response>) and executes cleanup operations in real-time. - Live Monitoring: Includes an embedded dashboard at
/dashboardproviding real-time metrics on context reduction, fidelity distribution, admission control, and latency. - REST API: Exposes endpoints for session management, benchmarking, and memory state inspection.
- OpenCode Integration: Includes a plugin for OpenCode to enable memory-aware context injection and status reporting.
Installation
Mnemosyne is a Python project managed with uv.
- Clone the repository:
git clone https://github.com/jyapayne/mnemosyne cd mnemosyne - Install dependencies:
uv sync
Configuration
Mnemosyne is configured via environment variables:
ANTHROPIC_API_KEY: API key for the provider.MNEMOSYNE_PORT: Port for the proxy server (default: 8080).MNEMOSYNE_HOST: Host for the proxy server (default: 127.0.0.1).
Usage
Run the proxy server:
uv run mnemosyne
Point your LLM client (e.g., Claude Code) to the proxy URL (e.g., http://127.0.0.1:8080/v1).
Claude Code Integration
Mnemosyne works with Claude Code out of the box. Claude Code uses the Anthropic SDK internally, which respects the ANTHROPIC_BASE_URL environment variable.
Quick Start
ANTHROPIC_BASE_URL=http://127.0.0.1:8080 claude
Permanent Setup
Add to your shell profile (~/.zshrc, ~/.bashrc, etc.):
export ANTHROPIC_BASE_URL=http://127.0.0.1:8080
All Claude Code sessions will then route through Mnemosyne automatically.
Authentication
Claude Code normally authenticates via Anthropic OAuth. When routing through Mnemosyne, there are two options:
-
OAuth passthrough (recommended): Mnemosyne forwards OAuth credentials to Anthropic transparently. No extra configuration needed — just run
claudeas usual. -
API key: Set
ANTHROPIC_API_KEYdirectly and Mnemosyne will forward it:export ANTHROPIC_API_KEY=sk-ant-... export ANTHROPIC_BASE_URL=http://127.0.0.1:8080
Systemd Service
To ensure Mnemosyne is always running when Claude Code starts, install the systemd user service:
cp mnemosyne.service ~/.config/systemd/user/
systemctl --user enable mnemosyne.service
systemctl --user start mnemosyne.service
Verifying the Connection
While Claude Code is running, check the dashboard at http://127.0.0.1:8080/dashboard or query the API:
curl -s http://127.0.0.1:8080/api/sessions | python3 -m json.tool
You should see an active session with incoming/outgoing byte counts.
API Reference
/api/sessions: List active conversation sessions./api/benchmark: Retrieve performance and token usage metrics./api/memory: Inspect the current state of the object store./api/blocks: View tracked conversation blocks.
How It Works
Request Lifecycle
- Inbound:
label_messages: Injects[block:xxxx]markers into message content.apply_to_messages: Drops or collapses blocks based on cleanup operations.compact_messages: Compresses remaining content.apply_fidelity: Adjusts content based on current fidelity levels.
- Outbound:
SSECleanupFilter: Strips cleanup tags from streaming responses, executes operations, and forwards the clean stream.
- Post-Response:
- Updates fidelity pressure, schedules summaries, and records benchmarks.
License
MIT