# Mnemosyne Mnemosyne is a transparent HTTP proxy designed to manage LLM context memory. It sits between LLM clients and providers like Anthropic or OpenAI, providing semantic segmentation, multi-fidelity compression, and live monitoring to reduce context costs and improve agent performance. ## Overview Mnemosyne acts as a drop-in proxy. It intercepts API requests, segments content into labeled blocks, and applies multi-fidelity compression to manage the context window. As context grows, Mnemosyne automatically degrades content fidelity to keep the most relevant information available while minimizing token usage. ## Architecture ```text Client → Mnemosyne Proxy → Provider API | +------+------+ | | BlockStore PageStore | | FidelityManager | | | SSECleanupFilter | | | ObjectStore <-----+ ``` ## Features * **Transparent Proxy**: Operates as a drop-in replacement for LLM API endpoints. No client-side changes are required. * **Semantic Segmentation**: Automatically labels conversation content with tensor/block IDs, allowing the model to reference specific segments. * **Multi-Fidelity Compression**: Supports five levels of fidelity (L0-L4): * L0: Full content (no compression) * L1: Detailed summary (~30% size) * L2: Compact summary (~5% size) * L3: Metadata stub (~50-100 tokens) * L4: Evicted (not in context) * **Pressure-Based Degradation**: Automatically downgrades object fidelity based on context window pressure zones (Normal, Caution, Warning, Critical, Emergency). * **SSE Cleanup Filter**: Intercepts streaming responses to strip internal memory management tags (e.g., ``, ``) and executes cleanup operations in real-time. * **Live Monitoring**: Includes an embedded dashboard at `/dashboard` providing real-time metrics on context reduction, fidelity distribution, admission control, and latency. * **REST API**: Exposes endpoints for session management, benchmarking, and memory state inspection. * **OpenCode Integration**: Includes a plugin for OpenCode to enable memory-aware context injection and status reporting. ## Installation Mnemosyne is a Python project managed with `uv`. 1. Clone the repository: ```bash git clone https://github.com/jyapayne/mnemosyne cd mnemosyne ``` 2. Install dependencies: ```bash uv sync ``` ## Configuration Mnemosyne is configured via environment variables: * `ANTHROPIC_API_KEY`: API key for the provider. * `MNEMOSYNE_PORT`: Port for the proxy server (default: 8080). * `MNEMOSYNE_HOST`: Host for the proxy server (default: 127.0.0.1). ## Usage Run the proxy server: ```bash uv run mnemosyne ``` Point your LLM client (e.g., Claude Code) to the proxy URL (e.g., `http://127.0.0.1:8080/v1`). ## Claude Code Integration Mnemosyne works with [Claude Code](https://docs.anthropic.com/en/docs/claude-code) out of the box. Claude Code uses the Anthropic SDK internally, which respects the `ANTHROPIC_BASE_URL` environment variable. ### Quick Start ```bash ANTHROPIC_BASE_URL=http://127.0.0.1:8080 claude ``` ### Permanent Setup Add to your shell profile (`~/.zshrc`, `~/.bashrc`, etc.): ```bash export ANTHROPIC_BASE_URL=http://127.0.0.1:8080 ``` All Claude Code sessions will then route through Mnemosyne automatically. ### Authentication Claude Code normally authenticates via Anthropic OAuth. When routing through Mnemosyne, there are two options: 1. **OAuth passthrough** (recommended): Mnemosyne forwards OAuth credentials to Anthropic transparently. No extra configuration needed — just run `claude` as usual. 2. **API key**: Set `ANTHROPIC_API_KEY` directly and Mnemosyne will forward it: ```bash export ANTHROPIC_API_KEY=sk-ant-... export ANTHROPIC_BASE_URL=http://127.0.0.1:8080 ``` ### Systemd Service To ensure Mnemosyne is always running when Claude Code starts, install the systemd user service: ```bash cp mnemosyne.service ~/.config/systemd/user/ systemctl --user enable mnemosyne.service systemctl --user start mnemosyne.service ``` ### Verifying the Connection While Claude Code is running, check the dashboard at `http://127.0.0.1:8080/dashboard` or query the API: ```bash curl -s http://127.0.0.1:8080/api/sessions | python3 -m json.tool ``` You should see an active session with incoming/outgoing byte counts. ## API Reference * `/api/sessions`: List active conversation sessions. * `/api/benchmark`: Retrieve performance and token usage metrics. * `/api/memory`: Inspect the current state of the object store. * `/api/blocks`: View tracked conversation blocks. ## How It Works ### Request Lifecycle 1. **Inbound**: * `label_messages`: Injects `[block:xxxx]` markers into message content. * `apply_to_messages`: Drops or collapses blocks based on cleanup operations. * `compact_messages`: Compresses remaining content. * `apply_fidelity`: Adjusts content based on current fidelity levels. 2. **Outbound**: * `SSECleanupFilter`: Strips cleanup tags from streaming responses, executes operations, and forwards the clean stream. 3. **Post-Response**: * Updates fidelity pressure, schedules summaries, and records benchmarks. ## License MIT