No description

Find a file

Joey Yakimowich-Payne 90dcf6a2f1 fix: disable _apply_fidelity again — stubs cause rapid retry loops Second time disabling. The auto-stub replacement produces content that opencode rejects, triggering 16+ retries per second. Needs deeper investigation into which content types can be safely stubbed vs which ones break the model.		2026-03-19 08:02:26 -06:00
deploy	feat: add systemd user service for mnemosyne auto-restart	2026-03-13 12:07:17 -06:00
opencode-plugin	fix: route mnemosyne provider instead of anthropic in opencode plugin	2026-03-13 13:38:26 -06:00
sql	chore: initialize project scaffold and config	2026-03-13 11:40:35 -06:00
src/mnemosyne	fix: disable _apply_fidelity again — stubs cause rapid retry loops	2026-03-19 08:02:26 -06:00
tests	test: add gateway integration tests	2026-03-13 11:41:28 -06:00
.gitignore	chore: initialize project scaffold and config	2026-03-13 11:40:35 -06:00
.python-version	chore: initialize project scaffold and config	2026-03-13 11:40:35 -06:00
ARCHITECTURE.md	docs: add architecture and reference documentation	2026-03-13 11:41:41 -06:00
docker-compose.yml	chore: initialize project scaffold and config	2026-03-13 11:40:35 -06:00
mnemosyne.toml	chore: initialize project scaffold and config	2026-03-13 11:40:35 -06:00
pyproject.toml	chore: initialize project scaffold and config	2026-03-13 11:40:35 -06:00
README.md	docs: add Claude Code integration instructions to README	2026-03-15 08:00:30 -06:00
REFERENCES.md	docs: add architecture and reference documentation	2026-03-13 11:41:41 -06:00
ROADMAP.md	docs: add architecture and reference documentation	2026-03-13 11:41:41 -06:00
SCHEMA.md	docs: add architecture and reference documentation	2026-03-13 11:41:41 -06:00

README.md

Mnemosyne

Mnemosyne is a transparent HTTP proxy designed to manage LLM context memory. It sits between LLM clients and providers like Anthropic or OpenAI, providing semantic segmentation, multi-fidelity compression, and live monitoring to reduce context costs and improve agent performance.

Overview

Mnemosyne acts as a drop-in proxy. It intercepts API requests, segments content into labeled blocks, and applies multi-fidelity compression to manage the context window. As context grows, Mnemosyne automatically degrades content fidelity to keep the most relevant information available while minimizing token usage.

Architecture

Client → Mnemosyne Proxy → Provider API
             |
      +------+------+
      |             |
  BlockStore    PageStore
      |             |
  FidelityManager   |
      |             |
 SSECleanupFilter   |
      |             |
  ObjectStore <-----+

Features

Transparent Proxy: Operates as a drop-in replacement for LLM API endpoints. No client-side changes are required.
Semantic Segmentation: Automatically labels conversation content with tensor/block IDs, allowing the model to reference specific segments.
Multi-Fidelity Compression: Supports five levels of fidelity (L0-L4):
- L0: Full content (no compression)
- L1: Detailed summary (~30% size)
- L2: Compact summary (~5% size)
- L3: Metadata stub (~50-100 tokens)
- L4: Evicted (not in context)
Pressure-Based Degradation: Automatically downgrades object fidelity based on context window pressure zones (Normal, Caution, Warning, Critical, Emergency).
SSE Cleanup Filter: Intercepts streaming responses to strip internal memory management tags (e.g., <memory_cleanup>, <yuyay-response>) and executes cleanup operations in real-time.
Live Monitoring: Includes an embedded dashboard at /dashboard providing real-time metrics on context reduction, fidelity distribution, admission control, and latency.
REST API: Exposes endpoints for session management, benchmarking, and memory state inspection.
OpenCode Integration: Includes a plugin for OpenCode to enable memory-aware context injection and status reporting.

Installation

Mnemosyne is a Python project managed with uv.

Clone the repository:

git clone https://github.com/jyapayne/mnemosyne
cd mnemosyne

Install dependencies:
```
uv sync
```

Configuration

Mnemosyne is configured via environment variables:

ANTHROPIC_API_KEY: API key for the provider.
MNEMOSYNE_PORT: Port for the proxy server (default: 8080).
MNEMOSYNE_HOST: Host for the proxy server (default: 127.0.0.1).

Usage

Run the proxy server:

uv run mnemosyne

Point your LLM client (e.g., Claude Code) to the proxy URL (e.g., http://127.0.0.1:8080/v1).

Claude Code Integration

Mnemosyne works with Claude Code out of the box. Claude Code uses the Anthropic SDK internally, which respects the ANTHROPIC_BASE_URL environment variable.

Quick Start

ANTHROPIC_BASE_URL=http://127.0.0.1:8080 claude

Permanent Setup

Add to your shell profile (~/.zshrc, ~/.bashrc, etc.):

export ANTHROPIC_BASE_URL=http://127.0.0.1:8080

All Claude Code sessions will then route through Mnemosyne automatically.

Authentication

Claude Code normally authenticates via Anthropic OAuth. When routing through Mnemosyne, there are two options:

OAuth passthrough (recommended): Mnemosyne forwards OAuth credentials to Anthropic transparently. No extra configuration needed — just run claude as usual.

API key: Set ANTHROPIC_API_KEY directly and Mnemosyne will forward it:

export ANTHROPIC_API_KEY=sk-ant-...
export ANTHROPIC_BASE_URL=http://127.0.0.1:8080

Systemd Service

To ensure Mnemosyne is always running when Claude Code starts, install the systemd user service:

cp mnemosyne.service ~/.config/systemd/user/
systemctl --user enable mnemosyne.service
systemctl --user start mnemosyne.service

Verifying the Connection

While Claude Code is running, check the dashboard at http://127.0.0.1:8080/dashboard or query the API:

curl -s http://127.0.0.1:8080/api/sessions | python3 -m json.tool

You should see an active session with incoming/outgoing byte counts.

API Reference

/api/sessions: List active conversation sessions.
/api/benchmark: Retrieve performance and token usage metrics.
/api/memory: Inspect the current state of the object store.
/api/blocks: View tracked conversation blocks.

How It Works

Request Lifecycle

Inbound:
- label_messages: Injects [block:xxxx] markers into message content.
- apply_to_messages: Drops or collapses blocks based on cleanup operations.
- compact_messages: Compresses remaining content.
- apply_fidelity: Adjusts content based on current fidelity levels.
Outbound:
- SSECleanupFilter: Strips cleanup tags from streaming responses, executes operations, and forwards the clean stream.
Post-Response:
- Updates fidelity pressure, schedules summaries, and records benchmarks.

License

MIT