mnemosyne/REFERENCES.md
Joey Yakimowich-Payne 7c6a3dbe4a docs: add architecture and reference documentation
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-13 11:41:41 -06:00

12 KiB

Research References

All papers, repositories, and prior art that informed this design.


Core Papers

Pichay — Demand Paging for LLM Context Windows (PRIMARY)

MemGPT / Letta — Virtual Memory for LLMs

  • Paper: MemGPT: Towards LLMs as Operating Systems
  • Authors: Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, Joseph E. Gonzalez (UC Berkeley)
  • Date: October 2023 (revised February 2024)
  • Repo: https://github.com/letta-ai/letta (SHA: 4cb2f21c)
  • Key findings: Three-tier memory hierarchy (core/recall/archival). Agent-initiated paging via tool calls. PostgreSQL + pgvector for archival storage. Partial-evict summarization (30% oldest messages). LLM-driven retrieval is surprisingly effective.
  • Used in: Object Store design (SCHEMA.md), multi-fidelity concept, backing store architecture (Phase 3)

xMemory — Hierarchical Structured Retrieval

  • Paper: Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation
  • Venue: ICML 2026
  • Key findings: Standard RAG on agent memory fails due to correlated content. Hierarchical retrieval (messages -> episodes -> semantics -> themes) prevents redundant retrieval. Sparsity-semantics objective for segmentation. Top-down retrieval reduces retrieved tokens while improving relevance.
  • Used in: Phase 6 (xMemory hierarchy), Phase 4a (segmentation concept)

L-RAG — Entropy-Based Lazy Context Loading

A-MAC — Adaptive Memory Admission Control


Supporting Papers

Factory — Anchored Iterative Summarization

  • Source: Factory's evaluation across 36,000 engineering sessions
  • Key findings: Anchored summarization (persistent state with intent/changes/decisions/ next_steps) outperforms rolling reconstruction. Scores: Factory 4.04 vs Anthropic 3.74 vs OpenAI 3.43 on accuracy/completeness/continuity.
  • Used in: Phase 2 (multi-fidelity compression design)

SWE-Pruner — Neural Context Pruning for Coding

  • Authors: Wang et al., 2026
  • Key findings: 0.6B-parameter neural skimmer for task-aware pruning. 23-54% token reduction on SWE-bench. Maintains solve rates.
  • Referenced for: Alternative approach to context reduction (learned pruning vs semantic objects)

ACON — Failure-Driven Compression Optimization

  • Paper: arXiv, October 2025
  • Key findings: Unified history + observation compression. 26-54% peak context reduction. Gradient-free, works with API models. Iteratively refines compression prompt based on failure cases.
  • Referenced for: Compression strategy comparison

Neural Paging — Learned Page Controller

CMV — DAG-Based Session History Trimming

  • Author: Santoni, 2026
  • Key findings: DAG-based session history structure. Structurally lossless trimming. Up to 86% reduction for tool-heavy sessions.
  • Referenced for: Alternative structural approach

MemOS — Memory Operating System for AGI

  • Authors: Li et al., 2025
  • Key findings: Full "Memory OS" with lifecycle control and persistent representations.
  • Referenced for: Long-term architecture vision

SideQuest — KV Cache Eviction via Parallel Reasoning

  • Authors: Kariyappa & Suh, 2026
  • Key findings: Fine-tuned parallel reasoning thread for KV cache eviction. 56-65% peak memory reduction. Irreversible eviction.
  • Referenced for: KV-cache-level optimization (complementary to our message-level approach)

Quest — Query-Aware KV Cache Sparsity

SpeContext — Speculative Context Sparsity

  • Paper: SpeContext: Enabling Efficient Long-context Reasoning
  • Authors: SJTU / Infinigence-AI, November 2025
  • Key findings: Small draft model predicts important KV cache tokens before main model runs. Analogous to speculative decoding but for context selection.
  • Referenced for: Helper model concept (similar philosophy at different layer)

SoK: Agentic RAG

  • Paper: SoK: Agentic RAG: Taxonomy, Architectures, Evaluation
  • Date: March 2026
  • Key findings: Definitive 2026 survey. Taxonomy of planning, retrieval, memory, and tool coordination patterns. Identifies risks: compounding hallucination, memory poisoning, retrieval misalignment.
  • Referenced for: Taxonomy and risk awareness

Mem0 — Fact Extraction + Merge Pipeline


Key Repositories

Direct Dependencies

Repo What We Use Phase
fsgeek/pichay Fork as starting point for proxy Phase 1
pgvector/pgvector PostgreSQL vector similarity Phase 3+
sentence-transformers all-MiniLM-L6-v2 embeddings Phase 3+

Reference Implementations

Repo What We Learn From Stars
letta-ai/letta 3-tier memory architecture, archival search 15k+
mem0ai/mem0 Fact extraction pipeline, multi-backend vector store 49k+
alibaizhanov/mengram 3-memory-type system (semantic/episodic/procedural) 86
PavanVkAlapati/memory_orchestration Layered memory with Qdrant + Redis + MongoDB -
GuilinDev/Adaptive_Memory_Admission_Control_LLM_Agents A-MAC admission scoring -
vivek-tiwari-vt/agmem Git-like version control for agent memories -
lm-sys/RouteLLM BERT classifier router for model selection -

MCP Servers (reference for Phase 5)

Repo What It Does
adamrdrew/agent-memory-mcp Hybrid BM25 + vector search, local embeddings, 12 memory categories
Parswanadh/memory-mcp-server 3-tier hierarchical memory (working/short-term/long-term)
vbcherepanov/claude-total-memory 4-tier search, 20 tools, ChromaDB + SQLite
van-reflect/Reflect-Memory Cross-agent memory, vendor-neutral

OpenCode / Oh-My-OpenCode Integration Points

OpenCode Plugin Hooks (from sst/opencode)

Hook Location Purpose for Mnemosyne
experimental.chat.messages.transform packages/opencode/src/session/prompt.ts:652 Modify message array before LLM call (context assembly)
experimental.session.compacting packages/opencode/src/session/compaction.ts:169 Custom compaction prompt/context
experimental.chat.system.transform packages/opencode/src/session/llm.ts:84 Modify system prompt (inject memory instructions)
tool.execute.before packages/plugin/src/index.ts:184 Intercept tool args before execution
tool.execute.after packages/plugin/src/index.ts:192 Process tool results for object creation
chat.params packages/opencode/src/session/llm.ts:114 Modify temperature, options

Oh-My-OpenCode Hooks (from omc-sh/oh-my-opencode)

Hook Purpose for Mnemosyne
context-window-monitor Existing hook -- can extend or replace
preemptive-compaction Existing hook -- integrate with our pressure system
tool-output-truncator Existing hook -- our fidelity system supersedes this
compaction-context-injector Inject our memory state into compaction prompt

Benchmark Datasets

For evaluating memory quality:

Dataset What It Tests URL
LoCoMo Long-conversation memory (QA over multi-session chat) https://github.com/letta-ai/letta/tree/main/tests
PerLTQA Personalized long-term QA Referenced in xMemory paper
SWE-bench Coding task completion (for measuring quality impact) https://github.com/princeton-nlp/SWE-bench
Terminal-Bench CLI agent task completion Referenced in Letta Code evaluation

Key Metrics from Literature

System Context Reduction Quality Impact Cost
Pichay (baseline eviction) 37% token, up to 93% extreme 0.0254% fault rate Zero (proxy only)
SWE-Pruner 23-54% Maintains solve rates Training cost for 0.6B model
ACON 26-54% peak 95%+ task accuracy preserved Multiple LLM calls for training
Factory summarization High 4.04/5 accuracy score 1 LLM call per eviction
Cursor lazy MCP loading 46.9% No degradation Zero (lazy loading)
Cline file deduplication Variable None (lossless) Zero (dedup only)
Simple observation masking ~50% Matches LLM summarization Zero
L-RAG entropy gating 26% retrieval reduction Marginal impact Logprob monitoring
RouteLLM model routing 85% cost reduction 95% quality maintained <10ms per route