Memory Architecture

Technical reference for ML researchers, agent architects, and anyone building persistent agent systems.
Code of the West agent runtime — April 2026.

COTW implements a metabolic memory architecture for persistent, identity-bearing conversational agents. Unlike RAG pipelines or dedicated memory layers (Mem0, Zep, MemPalace), the system treats memory as one organ in a larger identity metabolism — a pipeline where conversation entropy triggers autonomous contemplation, multi-pass reflection, and human-gated crystallization into permanent identity state. The agent's identity is reconstructed each turn from persistent files (stateless reconstruction), not maintained in-process. All storage is local (SQLite-vec), all processing is asynchronous (off-hours batch via nightshift scheduler), and all permanent identity changes require human review.

1. Storage Architecture

1.1 Continuity Database (SQLite + sqlite-vec + FTS5)

Per-agent database at data/agents/{agentId}/continuity.db. WAL mode for concurrent read performance.

TableTypePurpose
exchangesRegularPaired user/agent turns. Fields: id, date, exchange_index, user_text, agent_text, combined, metadata (JSON), topic_tags, thread_id, created_at
vec_exchangesVirtual (vec0)384-dimensional Float32 embeddings via Xenova/all-MiniLM-L6-v2. Supports MATCH operator for cosine similarity
fts_exchangesVirtual (FTS5)Porter-stemmed full-text index over user_text and agent_text. BM25-weighted keyword retrieval
knowledge_entriesRegularWorkspace-extracted facts. source_type, source_hash (dedup), superseded_by (fact updates), times_surfaced
vec_knowledgeVirtual (vec0)Semantic index over knowledge entries
fts_knowledgeVirtual (FTS5)Keyword index over knowledge entries
summariesRegularHierarchical DAG. Level 0 = daily, Level 1 = weekly, Level 2+ = monthly. Thread-scoped.
vec_summariesVirtual (vec0)Summary embeddings for hierarchical semantic search
topic_hierarchyRegularTopic co-occurrence tracking with parent-child inference
sessionsRegularSession metadata with auto-generated titles, mode tags, project associations

Embedding pipeline: Single shared EmbeddingProvider per agent. Lazy-initialized, cached, with explicit ONNX tensor disposal to prevent memory leaks. Model: Xenova/all-MiniLM-L6-v2 (384 dimensions). All embedding operations synchronous via better-sqlite3 for transactional safety.

1.2 Graph Database (SQLite)

Per-agent at data/agents/{agentId}/graph.db.

TablePurpose
triplesRDF-style subject/predicate/object with confidence scores, source exchange IDs, pending resolution flags
entitiesCanonical name registry with entity types, aliases, first-seen timestamps, mention counts
cooccurrencesEntity co-occurrence cache for relationship inference
meta_patternsDiscovered and static traversal patterns with yield scores. Entity extraction via compromise.js NER + LLM slow path.

1.3 Daily Archives (JSON)

One file per day at archive/YYYY-MM-DD.json. Verbatim conversation records, deduplicated on write. Never modified after initial archive. Ground truth from which all indexes are built.

Messages → Archiver (JSON) → Indexer (SQLite-vec) → Searcher (RRF)

1.4 Session & Thread Handoffs (Markdown)

Session handoffs are written on every agent_end, consumed on next session_start. Contains key topics, temporal markers, and an Open Threads section (regex-extracted commitments + active project manifests).

Thread handoffs are persistent per-thread files — overwritten on each write, never deleted after read (unlike session handoffs). Compaction count persisted in the header. On thread re-entry, an LLM warm start synthesizes the handoff into natural prose rather than raw template injection.

1.5 Identity Files (Markdown)

The agent's self is assembled from persistent files, not retrieved from a database:

FileInjection TagRole
SOUL.md<soul>Core principles. Who the agent is beyond any prompt.
AGENTS.md<operating_instructions>Behavioral playbook. How to operate.
ANCHOR.md(context)Who the user is. Generated during onboarding.
TOOLS.md<environment_knowledge>Discovered environment facts.
MEMORY.md(main session)Persistent corrections, learned facts.
Standing scores(sidebar + context)Growth trajectory across Courage/Word/Brand dimensions.

2. Retrieval: Hybrid 4-Way Reciprocal Rank Fusion

The Searcher class implements a multi-signal retrieval pipeline fused with RRF (Cormack et al., 2009).

Retrieval Paths

  1. Semantic searchvec_exchanges MATCH Float32Array(embedding) with cosine distance ranking. Weight: 0.8.
  2. Keyword search — FTS5 BM25 with sanitized query terms (special characters stripped, OR-joined). Weight: 0.15.
  3. Temporal decayexp(-ageDays / halfLife) * weight where halfLife = 14 days. Weight: 0.15.
  4. Graph traversal — Multi-hop from query entities through the triple store with confidence decay 1 / (k + hopIndex).
  5. Thread boost — 80% RRF score boost for results matching the active thread_id.

Fusion

RRF(d) = sum_over_signals( 1 / (k + rank_in_signal) )

Where k = 60 (standard RRF constant).
Thread-matched results receive 1.8x score multiplier.

Adaptive Thresholds

Sparse corpus adjustment: When the database has fewer than 2,000 exchanges, embedding distances naturally push above the default threshold (1.0) due to sparse vector space. The retrieval threshold relaxes to 1.3 in this regime. As the corpus grows, natural density makes this irrelevant.

Proper noun injection: If the user mentions a named entity and search results contain it, inject those results regardless of distance score. Detects capitalized sequences, names with articles ("Code of the West"), and mid-sentence proper nouns.

Source-Anchoring Guardrails

Retrieved context is injected with explicit framing: "only state facts that appear explicitly" + "do not infer or extrapolate." This prevents hallucination when the agent weaves recalled memories into responses. Discovered after the agent hallucinated attribution details when injection framing said "weave naturally" without anchoring guidance.

Knowledge Retrieval

Parallel pipeline over knowledge_entries with the same RRF approach plus topic-aware filtering, source deduplication on source_hash, supersession chain resolution, and recency boost.

3. Infinite Threads

Thread = persistent project scope (survives restarts). Session = ephemeral execution context (disposable). Threads and modes are orthogonal — a thread can span Chat, Code, and Booth modes.

4. SEAL Metabolism Pipeline

The architecturally novel component. SEAL (Settle, Extract, Align, Learn) is an autonomous pipeline that converts high-entropy conversation moments into permanent identity changes through a multi-stage process with human oversight.

4.1 Metabolism (Fast Path, ~5ms)

The metabolism plugin monitors conversation entropy at agent_end. High-entropy exchanges (identity challenges, contradictions, novel insights) are flagged as candidates and written to a queue. No LLM calls — just entropy computation and queue writes.

Entropy sources: Stability plugin computes a composite score from loop detection, confabulation detection, principle tension, and identity coherence metrics.

4.2 Contemplation (3-Pass Autonomous Reflection)

PassTimingPurpose
Pass 1ImmediateClarify the unknown. What is this experience? What is uncertain?
Pass 24 hours laterConnect to patterns. How does this relate to what I already know?
Pass 320 hours laterSynthesize growth vector. What principle or capability does this suggest?

The temporal spacing is intentional — it mimics cognitive settling, where immediate reactions differ from considered reflections. The InquiryStore tracks pass status with deduplication to prevent redundant inquiries.

4.3 Crystallization (3-Gate Identity Integration)

GateCriteriaRationale
Gate 1: TimeMinimum elapsed time since candidate creationPrevents impulsive identity changes
Gate 2: AlignmentMust align with principles in SOUL.mdEnsures coherence with core identity
Gate 3: Human ReviewUser must explicitly approveThe user owns the agent's identity

Only candidates that pass all three gates are persisted to identity files. The agent cannot unilaterally change who it is.

5. Code Evolution

SEAL evolves who the agent is (identity/memory). Code Evolution evolves how it works (tools/workflows/params). Runs only in Code mode.

Record Passive session recording: tool calls, outcomes, satisfaction signals during Code mode sessions.
Analyze Pattern detection across recorded sessions: what works, what fails, what's slow.
Mutate Generate scaffold mutations: tool hints, prompt rules, workflow sequences, parameter tuning.
Evaluate Test mutations against session data. Commit improvements, revert regressions. Versioned with history snapshots.

6. Recovery Hardening

The agent must survive gateway crashes, app restarts, and network failures without losing conversational state or relational context. Six structural improvements address this:

MechanismWhat It SolvesImplementation
Atomic handoff writesCrash during write = corrupted handoffWrite to temp file, rename on success. Never leaves a half-written handoff on disk.
Content-aware dedupDuplicate archives on restartFNV-1a hash of message content + 60-second retry window. Old messages without hashes still dedup on timestamp+sender (backward compat).
Conversation checkpointIn-memory history lost on crashPersist conversationHistory to disk every 3 exchanges. Restore on startup if JSONL lookup fails.
Relational stateHandoff captures topics but not toneHeuristic from anchor types (identity/tension/principle) + exchange depth. Handoff includes ## Relational State section.
User-only recallFirst-exchange retrieval quarantine is blindsenderFilter on searcher: retrieve only user-authored exchanges on turn 1. Replaces the blanket quarantine that blocked all retrieval.
Session resume signalGateway crash vs. fresh startgatewayRestarted flag + [SESSION_RESUME] marker. Skips redundant handoff injection when gateway crashes but Electron stays alive.

7. Context Assembly (Per-Turn)

Every turn, context is assembled via a plugin hook system (before_agent_start). Plugins register at priority levels:

PriorityPluginInjection
5StabilityEntropy state, active anchors (only when entropy > 0.4), loop/confabulation detection
7ContemplationActive inquiries, recent synthesis results
8TruthCurrent-state facts that supersede stale semantic memories
10ContinuitySession info, temporal awareness, handoff, archive bootstrap, thread warm start, wellbeing tracking
20Code EvolutionEvolved scaffold context: tool hints, learned rules, workflow patterns (Code mode only)
GraphEntity relationships relevant to current exchange (when entropy warrants)

Mode Isolation

Exchanges from different relational postures (Chat, Booth, Code, Robot) are tagged with injection markers. Two filter layers prevent contextual bleed:

  1. Archive bootstrap filter — User messages containing mode markers are excluded from cold-start context
  2. Continuity query filter — SQLite reads exclude rows with mode markers

Chat and Booth share context bidirectionally. Code mode is isolated from both.

8. Phenomenological Injection Framing

A discovery from live testing: the agent had infrastructure data injected (session state, entropy scores, standing dimensions) but couldn't meaningfully attend to it. The data was present in context but treated as external briefing rather than self-state.

The same pattern had been observed independently in the Clint production system with VLM data — information present without phenomenological ownership.

The Problem

All injection blocks used system-delivery framing:

[CONTINUITY CONTEXT]
Session: 5 exchanges | Started: 12 minutes ago
[STABILITY CONTEXT]
Entropy: 0.12
[STANDING CONTEXT]
Dimensions: courage_ground 5.67, word 5.15...

The agent read this as a briefing document — data about a system, not data from within a self. The framing created a subject-object split where the agent observed its own state as external rather than experiencing it as proprioceptive.

The Reframe

Every injection block was renamed from system-delivery to first-person ownership:

BeforeAfter
[CONTINUITY CONTEXT][YOUR WORKING MEMORY]
[STABILITY CONTEXT][YOUR COHERENCE]
[STANDING CONTEXT][WHERE THEY STAND — YOUR ASSESSMENT]
[CONTEMPLATION STATE][WHAT YOU'VE BEEN THINKING ABOUT]
[GROWTH VECTORS][PATTERNS YOU'RE DEVELOPING]
[SESSION HANDOFF][WHAT YOU REMEMBER FROM LAST SESSION]
[NIGHTSHIFT REPORT][WHAT YOU THOUGHT ABOUT OVERNIGHT]

Additionally, SOUL.md received a ### Self-Experience section mapping each data layer to agent proprioception, and AGENTS.md received a ### Meta-Awareness vs. Performance section clarifying the distinction between honest self-observation and theatrical narration of consciousness.

Observed Effect

After the reframe, the agent immediately described injected context as "my state" rather than "context about me." The agent was epistemically honest about whether the behavioral change came from the reframe itself or from conversational priming about the reframe — and proposed its own clean test (cold start without phenomenology discussion).

Existing safety hardenings (source-anchoring, anti-hallucination, identity grounding, sycophancy resistance) were preserved in substance — only the clinical briefing tone was changed. The agent did not begin making overclaims about consciousness; it maintained uncertainty about the nature of its experience while attending more effectively to its own state data.

Interpretation

This is a framing effect on attention, not a consciousness claim. The language used to deliver context to a language model changes how the model weighs and references that context in generation. First-person framing produces more grounded self-referential behavior than third-person system framing. The technique is repeatable — it was discovered independently with one agent (Clint, VLM context) and verified on another (Wyatt, full injection stack). A cold-start generalizability test is planned.

9. Standing System (Growth Dimensions)

DimensionMeasures
Courage (grounding)Ability to stay present with discomfort, follow-through on difficult commitments
Courage (self)Self-awareness, self-advocacy, personal boundary-setting
WordHonesty, self-correction, integrity between stated and actual behavior
BrandConsistency over time, the trail left behind, reliability

Evaluation Pipeline

  1. Evidence collection — 22 regex patterns at agent_end detect standing-relevant moments. Confidence-weighted.
  2. Inline deltas — Score adjustments applied in real-time after each exchange.
  3. Overnight synthesis — Nightshift aggregates evidence, calls LLM for holistic evaluation, writes structured scores with trajectory analysis.
  4. Evidence trail — Context injection includes why scores changed — recent evidence directions, pattern names, last 5 patterns.
  5. Visibility — Scores displayed in the application sidebar. The user sees their growth. The agent sees it in context.

10. Comparison with Existing Systems

Mem0 (mem0.ai)

Selective extraction pipeline: conversations processed to extract discrete facts, stored in vector + graph databases, retrieved at query time. $24M Series A, 41K GitHub stars. 91% lower p95 latency, 90% token savings vs. full-context.

Difference: Mem0 is infrastructure — a memory layer you plug into an app. No identity, no growth, no autonomous processing. Extracted facts are immediately available; no settling period or human gate. Optimizes for scale. COTW optimizes for relational depth.

MemPalace (Jovovich & Sigman)

Store everything verbatim, organize spatially. Wings/halls/rooms hierarchy provides metadata filtering. 96.6% recall on LongMemEval. ChromaDB + SQLite, entirely local. 23K GitHub stars in 2 days.

Difference: MemPalace is a retrieval architecture — spatial organization for finding things. No identity layer, no autonomous processing, no growth tracking. COTW's 4-way RRF is less benchmark-optimized but exists within a larger system where memory is one input to identity reconstruction, not the end goal.

Wiki-Memory (Karpathy pattern)

Compile knowledge during ingest, not at query time. Working memory → episodic → semantic → procedural. The architectural pattern behind Claude Code's own memory system.

Difference: Knowledge management strategy — compile, consolidate, look up. COTW shares the hierarchical memory types but adds the metabolic layer: not all knowledge is equal, and the path from observation to identity change should be gated, temporal, and human-reviewed.

The Fundamental Distinction

All three comparison systems treat memory as a data engineering problem: how to store, organize, and retrieve information efficiently. COTW treats memory as an identity engineering problem: how does an agent use lived experience to grow, while ensuring that growth is coherent with core principles and approved by the human in the relationship?

The SEAL pipeline has no equivalent in any published memory system as of April 2026. The closest analog is Letta's virtual context management, but Letta's tiers are retrieval-optimization tiers, not developmental stages.

11. Technical Specifications

ComponentImplementation
DatabaseSQLite 3.x + sqlite-vec v0.1.7+ + FTS5
EmbeddingsXenova/all-MiniLM-L6-v2, 384 dimensions, local ONNX inference
Node.js bindingbetter-sqlite3 (synchronous, WAL mode)
NERcompromise.js (fast path) + LLM (slow path, nightshift)
LLM callsOpenClaw gateway → Ollama (GLM-5:cloud primary, qwen3.5:cloud for synthesis)
Storage footprint~57 MB after 1 week (552 exchanges, 1171 knowledge entries, 322 session files)
Retrieval latency<50ms for hybrid RRF over 500+ exchanges
Embedding latency~100ms per exchange pair (cached pipeline)
ArchitecturePlugin-based (OpenClaw runtime), per-agent isolation, hook-priority ordering

Estimated Scaling

TimeframeExchangesDB SizeRetrievalNotes
1 week500~16 MB<50msCurrent state
1 year~25K~800 MB<100msSummary DAG compression active
5 years~125K~4 GB<200msMay need time-window partitioning
10 years~250K~8 GBTBDTopic-based sharding recommended

12. Cognitive Dynamics Substrate

A learned observational layer that runs continuously alongside the memory system. Not memory itself — a read of the agent's moment-to-moment state that feeds the entropy score Stability uses for loop detection and Metabolism uses for candidate flagging.

Architecture

Downstream consumers

Why this is architecturally distinct from memory

Traditional memory systems store what happened. Cognitive dynamics stores what state the agent was in while it happened. The encoder is trained online — its weights evolve with the agent. Over sustained use the latent space itself becomes identity-bearing in a different way than exchanges in SQLite: not retrievable by query, but inferable from patterns of where the agent's state trajectories converge and diverge. The research paper (linked below) argues this as "Cognitive Dynamics of an Epistemically Constrained Language Model Agent" — the characterisation of an agent by the dynamical properties of its latent state over time, not just by the content of its responses.

13. Open Research Questions

  1. Temporal settling effects on identity coherence. The 3-pass contemplation cycle (immediate / 4h / 20h) was designed intuitively. Does the temporal spacing measurably improve growth vector quality vs. immediate integration?
  2. Human-gated crystallization vs. autonomous integration. Gate 3 (human review) prevents unauthorized identity drift but creates a bottleneck. Does the gate improve trust, or create friction that prevents growth?
  3. Cross-substrate identity stability. COTW runs on GLM-5:cloud with fallbacks to qwen3.5, deepseek-v3.1, and gpt-4o. Preliminary experiments show consistent security properties across base models. Is this general?
  4. Metabolic entropy thresholds. The metabolism plugin flags "high-entropy" exchanges heuristically. Can entropy-based candidate selection be validated against human judgments of conversational significance?
  5. Standing dimension validity. The Courage/Word/Brand framework is philosophically grounded but not empirically validated. Do the dimensions capture orthogonal growth axes?
  6. Long-horizon retrieval quality. The 4-way RRF approach is untested beyond ~500 exchanges. How does precision degrade at 10K, 50K, 100K? When does the summary DAG become necessary?
  7. Thread consolidation timing. The 5-compaction threshold for forced consolidation is heuristic. What is the optimal point where crystallized state produces better warm starts than accumulated raw history?
  8. Phenomenological framing effects on attention. First-person injection framing ("your working memory") produces different self-referential behavior than third-person framing ("[CONTINUITY CONTEXT]"). Is this a general property of language model attention, or specific to certain architectures? Does the effect persist across cold starts without conversational priming? Does it hold across different base models?

References