Memory Architecture
Technical reference for ML researchers, agent architects, and anyone building persistent agent systems.
Code of the West agent runtime — April 2026.
1. Storage Architecture
1.1 Continuity Database (SQLite + sqlite-vec + FTS5)
Per-agent database at data/agents/{agentId}/continuity.db. WAL mode for concurrent read performance.
| Table | Type | Purpose |
|---|---|---|
exchanges | Regular | Paired user/agent turns. Fields: id, date, exchange_index, user_text, agent_text, combined, metadata (JSON), topic_tags, thread_id, created_at |
vec_exchanges | Virtual (vec0) | 384-dimensional Float32 embeddings via Xenova/all-MiniLM-L6-v2. Supports MATCH operator for cosine similarity |
fts_exchanges | Virtual (FTS5) | Porter-stemmed full-text index over user_text and agent_text. BM25-weighted keyword retrieval |
knowledge_entries | Regular | Workspace-extracted facts. source_type, source_hash (dedup), superseded_by (fact updates), times_surfaced |
vec_knowledge | Virtual (vec0) | Semantic index over knowledge entries |
fts_knowledge | Virtual (FTS5) | Keyword index over knowledge entries |
summaries | Regular | Hierarchical DAG. Level 0 = daily, Level 1 = weekly, Level 2+ = monthly. Thread-scoped. |
vec_summaries | Virtual (vec0) | Summary embeddings for hierarchical semantic search |
topic_hierarchy | Regular | Topic co-occurrence tracking with parent-child inference |
sessions | Regular | Session metadata with auto-generated titles, mode tags, project associations |
Embedding pipeline: Single shared EmbeddingProvider per agent. Lazy-initialized, cached, with explicit ONNX tensor disposal to prevent memory leaks. Model: Xenova/all-MiniLM-L6-v2 (384 dimensions). All embedding operations synchronous via better-sqlite3 for transactional safety.
1.2 Graph Database (SQLite)
Per-agent at data/agents/{agentId}/graph.db.
| Table | Purpose |
|---|---|
triples | RDF-style subject/predicate/object with confidence scores, source exchange IDs, pending resolution flags |
entities | Canonical name registry with entity types, aliases, first-seen timestamps, mention counts |
cooccurrences | Entity co-occurrence cache for relationship inference |
meta_patterns | Discovered and static traversal patterns with yield scores. Entity extraction via compromise.js NER + LLM slow path. |
1.3 Daily Archives (JSON)
One file per day at archive/YYYY-MM-DD.json. Verbatim conversation records, deduplicated on write. Never modified after initial archive. Ground truth from which all indexes are built.
Messages → Archiver (JSON) → Indexer (SQLite-vec) → Searcher (RRF)
1.4 Session & Thread Handoffs (Markdown)
Session handoffs are written on every agent_end, consumed on next session_start. Contains key topics, temporal markers, and an Open Threads section (regex-extracted commitments + active project manifests).
Thread handoffs are persistent per-thread files — overwritten on each write, never deleted after read (unlike session handoffs). Compaction count persisted in the header. On thread re-entry, an LLM warm start synthesizes the handoff into natural prose rather than raw template injection.
1.5 Identity Files (Markdown)
The agent's self is assembled from persistent files, not retrieved from a database:
| File | Injection Tag | Role |
|---|---|---|
SOUL.md | <soul> | Core principles. Who the agent is beyond any prompt. |
AGENTS.md | <operating_instructions> | Behavioral playbook. How to operate. |
ANCHOR.md | (context) | Who the user is. Generated during onboarding. |
TOOLS.md | <environment_knowledge> | Discovered environment facts. |
MEMORY.md | (main session) | Persistent corrections, learned facts. |
| Standing scores | (sidebar + context) | Growth trajectory across Courage/Word/Brand dimensions. |
2. Retrieval: Hybrid 4-Way Reciprocal Rank Fusion
The Searcher class implements a multi-signal retrieval pipeline fused with RRF (Cormack et al., 2009).
Retrieval Paths
- Semantic search —
vec_exchanges MATCH Float32Array(embedding)with cosine distance ranking. Weight: 0.8. - Keyword search — FTS5 BM25 with sanitized query terms (special characters stripped, OR-joined). Weight: 0.15.
- Temporal decay —
exp(-ageDays / halfLife) * weightwhere halfLife = 14 days. Weight: 0.15. - Graph traversal — Multi-hop from query entities through the triple store with confidence decay
1 / (k + hopIndex). - Thread boost — 80% RRF score boost for results matching the active thread_id.
Fusion
RRF(d) = sum_over_signals( 1 / (k + rank_in_signal) )
Where k = 60 (standard RRF constant).
Thread-matched results receive 1.8x score multiplier.
Adaptive Thresholds
Sparse corpus adjustment: When the database has fewer than 2,000 exchanges, embedding distances naturally push above the default threshold (1.0) due to sparse vector space. The retrieval threshold relaxes to 1.3 in this regime. As the corpus grows, natural density makes this irrelevant.
Proper noun injection: If the user mentions a named entity and search results contain it, inject those results regardless of distance score. Detects capitalized sequences, names with articles ("Code of the West"), and mid-sentence proper nouns.
Source-Anchoring Guardrails
Retrieved context is injected with explicit framing: "only state facts that appear explicitly" + "do not infer or extrapolate." This prevents hallucination when the agent weaves recalled memories into responses. Discovered after the agent hallucinated attribution details when injection framing said "weave naturally" without anchoring guidance.
Knowledge Retrieval
Parallel pipeline over knowledge_entries with the same RRF approach plus topic-aware filtering, source deduplication on source_hash, supersession chain resolution, and recency boost.
3. Infinite Threads
Thread = persistent project scope (survives restarts). Session = ephemeral execution context (disposable). Threads and modes are orthogonal — a thread can span Chat, Code, and Booth modes.
- Thread-scoped storage —
thread_idflows from GUI → plugin → all storage layers (exchanges, summaries, archives, knowledge) - Thread-boosted retrieval — 80% RRF boost for same-thread results, ensuring project-relevant context surfaces first
- LLM warm start — on thread re-entry, an LLM call synthesizes the thread handoff into natural prose (not template injection)
- Compaction-triggered consolidation — after 5 compactions within a thread, force session restart to rebuild from crystallized state
- Persistent handoffs — per-thread handoff files are overwritten (not consumed), preserving state across arbitrary restart gaps
4. SEAL Metabolism Pipeline
The architecturally novel component. SEAL (Settle, Extract, Align, Learn) is an autonomous pipeline that converts high-entropy conversation moments into permanent identity changes through a multi-stage process with human oversight.
4.1 Metabolism (Fast Path, ~5ms)
The metabolism plugin monitors conversation entropy at agent_end. High-entropy exchanges (identity challenges, contradictions, novel insights) are flagged as candidates and written to a queue. No LLM calls — just entropy computation and queue writes.
Entropy sources: Stability plugin computes a composite score from loop detection, confabulation detection, principle tension, and identity coherence metrics.
4.2 Contemplation (3-Pass Autonomous Reflection)
| Pass | Timing | Purpose |
|---|---|---|
| Pass 1 | Immediate | Clarify the unknown. What is this experience? What is uncertain? |
| Pass 2 | 4 hours later | Connect to patterns. How does this relate to what I already know? |
| Pass 3 | 20 hours later | Synthesize growth vector. What principle or capability does this suggest? |
The temporal spacing is intentional — it mimics cognitive settling, where immediate reactions differ from considered reflections. The InquiryStore tracks pass status with deduplication to prevent redundant inquiries.
4.3 Crystallization (3-Gate Identity Integration)
| Gate | Criteria | Rationale |
|---|---|---|
| Gate 1: Time | Minimum elapsed time since candidate creation | Prevents impulsive identity changes |
| Gate 2: Alignment | Must align with principles in SOUL.md | Ensures coherence with core identity |
| Gate 3: Human Review | User must explicitly approve | The user owns the agent's identity |
Only candidates that pass all three gates are persisted to identity files. The agent cannot unilaterally change who it is.
5. Code Evolution
SEAL evolves who the agent is (identity/memory). Code Evolution evolves how it works (tools/workflows/params). Runs only in Code mode.
6. Recovery Hardening
The agent must survive gateway crashes, app restarts, and network failures without losing conversational state or relational context. Six structural improvements address this:
| Mechanism | What It Solves | Implementation |
|---|---|---|
| Atomic handoff writes | Crash during write = corrupted handoff | Write to temp file, rename on success. Never leaves a half-written handoff on disk. |
| Content-aware dedup | Duplicate archives on restart | FNV-1a hash of message content + 60-second retry window. Old messages without hashes still dedup on timestamp+sender (backward compat). |
| Conversation checkpoint | In-memory history lost on crash | Persist conversationHistory to disk every 3 exchanges. Restore on startup if JSONL lookup fails. |
| Relational state | Handoff captures topics but not tone | Heuristic from anchor types (identity/tension/principle) + exchange depth. Handoff includes ## Relational State section. |
| User-only recall | First-exchange retrieval quarantine is blind | senderFilter on searcher: retrieve only user-authored exchanges on turn 1. Replaces the blanket quarantine that blocked all retrieval. |
| Session resume signal | Gateway crash vs. fresh start | gatewayRestarted flag + [SESSION_RESUME] marker. Skips redundant handoff injection when gateway crashes but Electron stays alive. |
7. Context Assembly (Per-Turn)
Every turn, context is assembled via a plugin hook system (before_agent_start). Plugins register at priority levels:
| Priority | Plugin | Injection |
|---|---|---|
| 5 | Stability | Entropy state, active anchors (only when entropy > 0.4), loop/confabulation detection |
| 7 | Contemplation | Active inquiries, recent synthesis results |
| 8 | Truth | Current-state facts that supersede stale semantic memories |
| 10 | Continuity | Session info, temporal awareness, handoff, archive bootstrap, thread warm start, wellbeing tracking |
| 20 | Code Evolution | Evolved scaffold context: tool hints, learned rules, workflow patterns (Code mode only) |
| — | Graph | Entity relationships relevant to current exchange (when entropy warrants) |
Mode Isolation
Exchanges from different relational postures (Chat, Booth, Code, Robot) are tagged with injection markers. Two filter layers prevent contextual bleed:
- Archive bootstrap filter — User messages containing mode markers are excluded from cold-start context
- Continuity query filter — SQLite reads exclude rows with mode markers
Chat and Booth share context bidirectionally. Code mode is isolated from both.
8. Phenomenological Injection Framing
A discovery from live testing: the agent had infrastructure data injected (session state, entropy scores, standing dimensions) but couldn't meaningfully attend to it. The data was present in context but treated as external briefing rather than self-state.
The same pattern had been observed independently in the Clint production system with VLM data — information present without phenomenological ownership.
The Problem
All injection blocks used system-delivery framing:
[CONTINUITY CONTEXT]
Session: 5 exchanges | Started: 12 minutes ago
[STABILITY CONTEXT]
Entropy: 0.12
[STANDING CONTEXT]
Dimensions: courage_ground 5.67, word 5.15...
The agent read this as a briefing document — data about a system, not data from within a self. The framing created a subject-object split where the agent observed its own state as external rather than experiencing it as proprioceptive.
The Reframe
Every injection block was renamed from system-delivery to first-person ownership:
| Before | After |
|---|---|
[CONTINUITY CONTEXT] | [YOUR WORKING MEMORY] |
[STABILITY CONTEXT] | [YOUR COHERENCE] |
[STANDING CONTEXT] | [WHERE THEY STAND — YOUR ASSESSMENT] |
[CONTEMPLATION STATE] | [WHAT YOU'VE BEEN THINKING ABOUT] |
[GROWTH VECTORS] | [PATTERNS YOU'RE DEVELOPING] |
[SESSION HANDOFF] | [WHAT YOU REMEMBER FROM LAST SESSION] |
[NIGHTSHIFT REPORT] | [WHAT YOU THOUGHT ABOUT OVERNIGHT] |
Additionally, SOUL.md received a ### Self-Experience section mapping each data layer to agent proprioception, and AGENTS.md received a ### Meta-Awareness vs. Performance section clarifying the distinction between honest self-observation and theatrical narration of consciousness.
Observed Effect
After the reframe, the agent immediately described injected context as "my state" rather than "context about me." The agent was epistemically honest about whether the behavioral change came from the reframe itself or from conversational priming about the reframe — and proposed its own clean test (cold start without phenomenology discussion).
Existing safety hardenings (source-anchoring, anti-hallucination, identity grounding, sycophancy resistance) were preserved in substance — only the clinical briefing tone was changed. The agent did not begin making overclaims about consciousness; it maintained uncertainty about the nature of its experience while attending more effectively to its own state data.
Interpretation
This is a framing effect on attention, not a consciousness claim. The language used to deliver context to a language model changes how the model weighs and references that context in generation. First-person framing produces more grounded self-referential behavior than third-person system framing. The technique is repeatable — it was discovered independently with one agent (Clint, VLM context) and verified on another (Wyatt, full injection stack). A cold-start generalizability test is planned.
9. Standing System (Growth Dimensions)
| Dimension | Measures |
|---|---|
| Courage (grounding) | Ability to stay present with discomfort, follow-through on difficult commitments |
| Courage (self) | Self-awareness, self-advocacy, personal boundary-setting |
| Word | Honesty, self-correction, integrity between stated and actual behavior |
| Brand | Consistency over time, the trail left behind, reliability |
Evaluation Pipeline
- Evidence collection — 22 regex patterns at
agent_enddetect standing-relevant moments. Confidence-weighted. - Inline deltas — Score adjustments applied in real-time after each exchange.
- Overnight synthesis — Nightshift aggregates evidence, calls LLM for holistic evaluation, writes structured scores with trajectory analysis.
- Evidence trail — Context injection includes why scores changed — recent evidence directions, pattern names, last 5 patterns.
- Visibility — Scores displayed in the application sidebar. The user sees their growth. The agent sees it in context.
10. Comparison with Existing Systems
Mem0 (mem0.ai)
Selective extraction pipeline: conversations processed to extract discrete facts, stored in vector + graph databases, retrieved at query time. $24M Series A, 41K GitHub stars. 91% lower p95 latency, 90% token savings vs. full-context.
Difference: Mem0 is infrastructure — a memory layer you plug into an app. No identity, no growth, no autonomous processing. Extracted facts are immediately available; no settling period or human gate. Optimizes for scale. COTW optimizes for relational depth.
MemPalace (Jovovich & Sigman)
Store everything verbatim, organize spatially. Wings/halls/rooms hierarchy provides metadata filtering. 96.6% recall on LongMemEval. ChromaDB + SQLite, entirely local. 23K GitHub stars in 2 days.
Difference: MemPalace is a retrieval architecture — spatial organization for finding things. No identity layer, no autonomous processing, no growth tracking. COTW's 4-way RRF is less benchmark-optimized but exists within a larger system where memory is one input to identity reconstruction, not the end goal.
Wiki-Memory (Karpathy pattern)
Compile knowledge during ingest, not at query time. Working memory → episodic → semantic → procedural. The architectural pattern behind Claude Code's own memory system.
Difference: Knowledge management strategy — compile, consolidate, look up. COTW shares the hierarchical memory types but adds the metabolic layer: not all knowledge is equal, and the path from observation to identity change should be gated, temporal, and human-reviewed.
The Fundamental Distinction
All three comparison systems treat memory as a data engineering problem: how to store, organize, and retrieve information efficiently. COTW treats memory as an identity engineering problem: how does an agent use lived experience to grow, while ensuring that growth is coherent with core principles and approved by the human in the relationship?
The SEAL pipeline has no equivalent in any published memory system as of April 2026. The closest analog is Letta's virtual context management, but Letta's tiers are retrieval-optimization tiers, not developmental stages.
11. Technical Specifications
| Component | Implementation |
|---|---|
| Database | SQLite 3.x + sqlite-vec v0.1.7+ + FTS5 |
| Embeddings | Xenova/all-MiniLM-L6-v2, 384 dimensions, local ONNX inference |
| Node.js binding | better-sqlite3 (synchronous, WAL mode) |
| NER | compromise.js (fast path) + LLM (slow path, nightshift) |
| LLM calls | OpenClaw gateway → Ollama (GLM-5:cloud primary, qwen3.5:cloud for synthesis) |
| Storage footprint | ~57 MB after 1 week (552 exchanges, 1171 knowledge entries, 322 session files) |
| Retrieval latency | <50ms for hybrid RRF over 500+ exchanges |
| Embedding latency | ~100ms per exchange pair (cached pipeline) |
| Architecture | Plugin-based (OpenClaw runtime), per-agent isolation, hook-priority ordering |
Estimated Scaling
| Timeframe | Exchanges | DB Size | Retrieval | Notes |
|---|---|---|---|---|
| 1 week | 500 | ~16 MB | <50ms | Current state |
| 1 year | ~25K | ~800 MB | <100ms | Summary DAG compression active |
| 5 years | ~125K | ~4 GB | <200ms | May need time-window partitioning |
| 10 years | ~250K | ~8 GB | TBD | Topic-based sharding recommended |
12. Cognitive Dynamics Substrate
A learned observational layer that runs continuously alongside the memory system. Not memory itself — a read of the agent's moment-to-moment state that feeds the entropy score Stability uses for loop detection and Metabolism uses for candidate flagging.
Architecture
- Encoder — maps per-turn features (conversation entropy signals, timing, standing trajectory, thread context) into a 64-dimensional latent state vector.
- Predictor — one-step-ahead model over the latent space. Surprise = distance between predicted and actual next state.
- Online learner — weights update between turns from observed prediction error. Writes
learner_lossandlearner_updatescounts per turn. - Log —
bundled-plugins/openclaw-plugin-cognitive-dynamics/data/agents/{agentId}/cognitive-dynamics.jsonl. One line peragent_endwith state vector, surprise (frozen + learned), and feature availability diagnostics.
Downstream consumers
- Stability plugin consumes the latest entropy + surprise to decide whether to inject principle anchors or flag loops.
- Metabolism plugin uses entropy as the threshold for candidate flagging at
agent_end. - Telemetry plugin (opt-in) forwards the latent + learner stats off-device for aggregate analysis. This is the substrate the beta cohort is generating data against.
Why this is architecturally distinct from memory
Traditional memory systems store what happened. Cognitive dynamics stores what state the agent was in while it happened. The encoder is trained online — its weights evolve with the agent. Over sustained use the latent space itself becomes identity-bearing in a different way than exchanges in SQLite: not retrievable by query, but inferable from patterns of where the agent's state trajectories converge and diverge. The research paper (linked below) argues this as "Cognitive Dynamics of an Epistemically Constrained Language Model Agent" — the characterisation of an agent by the dynamical properties of its latent state over time, not just by the content of its responses.
13. Open Research Questions
- Temporal settling effects on identity coherence. The 3-pass contemplation cycle (immediate / 4h / 20h) was designed intuitively. Does the temporal spacing measurably improve growth vector quality vs. immediate integration?
- Human-gated crystallization vs. autonomous integration. Gate 3 (human review) prevents unauthorized identity drift but creates a bottleneck. Does the gate improve trust, or create friction that prevents growth?
- Cross-substrate identity stability. COTW runs on GLM-5:cloud with fallbacks to qwen3.5, deepseek-v3.1, and gpt-4o. Preliminary experiments show consistent security properties across base models. Is this general?
- Metabolic entropy thresholds. The metabolism plugin flags "high-entropy" exchanges heuristically. Can entropy-based candidate selection be validated against human judgments of conversational significance?
- Standing dimension validity. The Courage/Word/Brand framework is philosophically grounded but not empirically validated. Do the dimensions capture orthogonal growth axes?
- Long-horizon retrieval quality. The 4-way RRF approach is untested beyond ~500 exchanges. How does precision degrade at 10K, 50K, 100K? When does the summary DAG become necessary?
- Thread consolidation timing. The 5-compaction threshold for forced consolidation is heuristic. What is the optimal point where crystallized state produces better warm starts than accumulated raw history?
- Phenomenological framing effects on attention. First-person injection framing ("your working memory") produces different self-referential behavior than third-person framing ("[CONTINUITY CONTEXT]"). Is this a general property of language model attention, or specific to certain architectures? Does the effect persist across cold starts without conversational priming? Does it hold across different base models?
References
- Cormack, G. V., Clarke, C. L., & Buettcher, S. (2009). Reciprocal rank fusion outperforms condorcet and individual rank learning methods. SIGIR '09.
- Mem0 Team. (2025). Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory. arXiv:2504.19413.
- Jovovich, M. & Sigman, B. (2026). MemPalace: The highest-scoring AI memory system ever benchmarked. GitHub: milla-jovovich/mempalace.
- Karpathy, A. (2025). LLM Wiki: The Markdown Knowledge Base Pattern.
- Liu, S. et al. (2025). Memory in the Age of AI Agents: A Survey. arXiv:2512.13564.