No description
Find a file
Joey Yakimowich-Payne 90dcf6a2f1 fix: disable _apply_fidelity again — stubs cause rapid retry loops
Second time disabling. The auto-stub replacement produces content
that opencode rejects, triggering 16+ retries per second. Needs
deeper investigation into which content types can be safely stubbed
vs which ones break the model.
2026-03-19 08:02:26 -06:00
deploy feat: add systemd user service for mnemosyne auto-restart 2026-03-13 12:07:17 -06:00
opencode-plugin fix: route mnemosyne provider instead of anthropic in opencode plugin 2026-03-13 13:38:26 -06:00
sql chore: initialize project scaffold and config 2026-03-13 11:40:35 -06:00
src/mnemosyne fix: disable _apply_fidelity again — stubs cause rapid retry loops 2026-03-19 08:02:26 -06:00
tests test: add gateway integration tests 2026-03-13 11:41:28 -06:00
.gitignore chore: initialize project scaffold and config 2026-03-13 11:40:35 -06:00
.python-version chore: initialize project scaffold and config 2026-03-13 11:40:35 -06:00
ARCHITECTURE.md docs: add architecture and reference documentation 2026-03-13 11:41:41 -06:00
docker-compose.yml chore: initialize project scaffold and config 2026-03-13 11:40:35 -06:00
mnemosyne.toml chore: initialize project scaffold and config 2026-03-13 11:40:35 -06:00
pyproject.toml chore: initialize project scaffold and config 2026-03-13 11:40:35 -06:00
README.md docs: add Claude Code integration instructions to README 2026-03-15 08:00:30 -06:00
REFERENCES.md docs: add architecture and reference documentation 2026-03-13 11:41:41 -06:00
ROADMAP.md docs: add architecture and reference documentation 2026-03-13 11:41:41 -06:00
SCHEMA.md docs: add architecture and reference documentation 2026-03-13 11:41:41 -06:00

Mnemosyne

Mnemosyne is a transparent HTTP proxy designed to manage LLM context memory. It sits between LLM clients and providers like Anthropic or OpenAI, providing semantic segmentation, multi-fidelity compression, and live monitoring to reduce context costs and improve agent performance.

Overview

Mnemosyne acts as a drop-in proxy. It intercepts API requests, segments content into labeled blocks, and applies multi-fidelity compression to manage the context window. As context grows, Mnemosyne automatically degrades content fidelity to keep the most relevant information available while minimizing token usage.

Architecture

Client → Mnemosyne Proxy → Provider API
             |
      +------+------+
      |             |
  BlockStore    PageStore
      |             |
  FidelityManager   |
      |             |
 SSECleanupFilter   |
      |             |
  ObjectStore <-----+

Features

  • Transparent Proxy: Operates as a drop-in replacement for LLM API endpoints. No client-side changes are required.
  • Semantic Segmentation: Automatically labels conversation content with tensor/block IDs, allowing the model to reference specific segments.
  • Multi-Fidelity Compression: Supports five levels of fidelity (L0-L4):
    • L0: Full content (no compression)
    • L1: Detailed summary (~30% size)
    • L2: Compact summary (~5% size)
    • L3: Metadata stub (~50-100 tokens)
    • L4: Evicted (not in context)
  • Pressure-Based Degradation: Automatically downgrades object fidelity based on context window pressure zones (Normal, Caution, Warning, Critical, Emergency).
  • SSE Cleanup Filter: Intercepts streaming responses to strip internal memory management tags (e.g., <memory_cleanup>, <yuyay-response>) and executes cleanup operations in real-time.
  • Live Monitoring: Includes an embedded dashboard at /dashboard providing real-time metrics on context reduction, fidelity distribution, admission control, and latency.
  • REST API: Exposes endpoints for session management, benchmarking, and memory state inspection.
  • OpenCode Integration: Includes a plugin for OpenCode to enable memory-aware context injection and status reporting.

Installation

Mnemosyne is a Python project managed with uv.

  1. Clone the repository:
    git clone https://github.com/jyapayne/mnemosyne
    cd mnemosyne
    
  2. Install dependencies:
    uv sync
    

Configuration

Mnemosyne is configured via environment variables:

  • ANTHROPIC_API_KEY: API key for the provider.
  • MNEMOSYNE_PORT: Port for the proxy server (default: 8080).
  • MNEMOSYNE_HOST: Host for the proxy server (default: 127.0.0.1).

Usage

Run the proxy server:

uv run mnemosyne

Point your LLM client (e.g., Claude Code) to the proxy URL (e.g., http://127.0.0.1:8080/v1).

Claude Code Integration

Mnemosyne works with Claude Code out of the box. Claude Code uses the Anthropic SDK internally, which respects the ANTHROPIC_BASE_URL environment variable.

Quick Start

ANTHROPIC_BASE_URL=http://127.0.0.1:8080 claude

Permanent Setup

Add to your shell profile (~/.zshrc, ~/.bashrc, etc.):

export ANTHROPIC_BASE_URL=http://127.0.0.1:8080

All Claude Code sessions will then route through Mnemosyne automatically.

Authentication

Claude Code normally authenticates via Anthropic OAuth. When routing through Mnemosyne, there are two options:

  1. OAuth passthrough (recommended): Mnemosyne forwards OAuth credentials to Anthropic transparently. No extra configuration needed — just run claude as usual.

  2. API key: Set ANTHROPIC_API_KEY directly and Mnemosyne will forward it:

    export ANTHROPIC_API_KEY=sk-ant-...
    export ANTHROPIC_BASE_URL=http://127.0.0.1:8080
    

Systemd Service

To ensure Mnemosyne is always running when Claude Code starts, install the systemd user service:

cp mnemosyne.service ~/.config/systemd/user/
systemctl --user enable mnemosyne.service
systemctl --user start mnemosyne.service

Verifying the Connection

While Claude Code is running, check the dashboard at http://127.0.0.1:8080/dashboard or query the API:

curl -s http://127.0.0.1:8080/api/sessions | python3 -m json.tool

You should see an active session with incoming/outgoing byte counts.

API Reference

  • /api/sessions: List active conversation sessions.
  • /api/benchmark: Retrieve performance and token usage metrics.
  • /api/memory: Inspect the current state of the object store.
  • /api/blocks: View tracked conversation blocks.

How It Works

Request Lifecycle

  1. Inbound:
    • label_messages: Injects [block:xxxx] markers into message content.
    • apply_to_messages: Drops or collapses blocks based on cleanup operations.
    • compact_messages: Compresses remaining content.
    • apply_fidelity: Adjusts content based on current fidelity levels.
  2. Outbound:
    • SSECleanupFilter: Strips cleanup tags from streaming responses, executes operations, and forwards the clean stream.
  3. Post-Response:
    • Updates fidelity pressure, schedules summaries, and records benchmarks.

License

MIT