No description

Find a file

Joey Yakimowich-Payne 5580bc87cc fix: handle yuyar typo variant of yuyay-response tags Models occasionally misspell yuyay as yuyar since it's a made-up word. Use [yr] character class in all regexes and partial tag prefixes.		2026-03-14 09:16:47 -06:00
deploy	feat: add systemd user service for mnemosyne auto-restart	2026-03-13 12:07:17 -06:00
opencode-plugin	fix: route mnemosyne provider instead of anthropic in opencode plugin	2026-03-13 13:38:26 -06:00
sql	chore: initialize project scaffold and config	2026-03-13 11:40:35 -06:00
src/mnemosyne	fix: handle yuyar typo variant of yuyay-response tags	2026-03-14 09:16:47 -06:00
tests	test: add gateway integration tests	2026-03-13 11:41:28 -06:00
.gitignore	chore: initialize project scaffold and config	2026-03-13 11:40:35 -06:00
.python-version	chore: initialize project scaffold and config	2026-03-13 11:40:35 -06:00
ARCHITECTURE.md	docs: add architecture and reference documentation	2026-03-13 11:41:41 -06:00
docker-compose.yml	chore: initialize project scaffold and config	2026-03-13 11:40:35 -06:00
mnemosyne.toml	chore: initialize project scaffold and config	2026-03-13 11:40:35 -06:00
pyproject.toml	chore: initialize project scaffold and config	2026-03-13 11:40:35 -06:00
README.md	docs: add comprehensive README	2026-03-14 09:13:16 -06:00
REFERENCES.md	docs: add architecture and reference documentation	2026-03-13 11:41:41 -06:00
ROADMAP.md	docs: add architecture and reference documentation	2026-03-13 11:41:41 -06:00
SCHEMA.md	docs: add architecture and reference documentation	2026-03-13 11:41:41 -06:00

README.md

Mnemosyne

Mnemosyne is a transparent HTTP proxy designed to manage LLM context memory. It sits between LLM clients and providers like Anthropic or OpenAI, providing semantic segmentation, multi-fidelity compression, and live monitoring to reduce context costs and improve agent performance.

Overview

Mnemosyne acts as a drop-in proxy. It intercepts API requests, segments content into labeled blocks, and applies multi-fidelity compression to manage the context window. As context grows, Mnemosyne automatically degrades content fidelity to keep the most relevant information available while minimizing token usage.

Architecture

Client → Mnemosyne Proxy → Provider API
             |
      +------+------+
      |             |
  BlockStore    PageStore
      |             |
  FidelityManager   |
      |             |
 SSECleanupFilter   |
      |             |
  ObjectStore <-----+

Features

Transparent Proxy: Operates as a drop-in replacement for LLM API endpoints. No client-side changes are required.
Semantic Segmentation: Automatically labels conversation content with tensor/block IDs, allowing the model to reference specific segments.
Multi-Fidelity Compression: Supports five levels of fidelity (L0-L4):
- L0: Full content (no compression)
- L1: Detailed summary (~30% size)
- L2: Compact summary (~5% size)
- L3: Metadata stub (~50-100 tokens)
- L4: Evicted (not in context)
Pressure-Based Degradation: Automatically downgrades object fidelity based on context window pressure zones (Normal, Caution, Warning, Critical, Emergency).
SSE Cleanup Filter: Intercepts streaming responses to strip internal memory management tags (e.g., <memory_cleanup>, <yuyay-response>) and executes cleanup operations in real-time.
Live Monitoring: Includes an embedded dashboard at /dashboard providing real-time metrics on context reduction, fidelity distribution, admission control, and latency.
REST API: Exposes endpoints for session management, benchmarking, and memory state inspection.
OpenCode Integration: Includes a plugin for OpenCode to enable memory-aware context injection and status reporting.

Installation

Mnemosyne is a Python project managed with uv.

Clone the repository:

git clone https://github.com/jyapayne/mnemosyne
cd mnemosyne

Install dependencies:
```
uv sync
```

Configuration

Mnemosyne is configured via environment variables:

ANTHROPIC_API_KEY: API key for the provider.
MNEMOSYNE_PORT: Port for the proxy server (default: 8080).
MNEMOSYNE_HOST: Host for the proxy server (default: 127.0.0.1).

Usage

Run the proxy server:

uv run mnemosyne

Point your LLM client (e.g., Claude Code) to the proxy URL (e.g., http://127.0.0.1:8080/v1).

API Reference

/api/sessions: List active conversation sessions.
/api/benchmark: Retrieve performance and token usage metrics.
/api/memory: Inspect the current state of the object store.
/api/blocks: View tracked conversation blocks.

How It Works

Request Lifecycle

Inbound:
- label_messages: Injects [block:xxxx] markers into message content.
- apply_to_messages: Drops or collapses blocks based on cleanup operations.
- compact_messages: Compresses remaining content.
- apply_fidelity: Adjusts content based on current fidelity levels.
Outbound:
- SSECleanupFilter: Strips cleanup tags from streaming responses, executes operations, and forwards the clean stream.
Post-Response:
- Updates fidelity pressure, schedules summaries, and records benchmarks.

License

MIT