Memory Architecture

Technical reference for ML researchers, agent architects, and anyone building persistent agent systems.
Code of the West agent runtime — May 2026.

COTW implements a metabolic memory architecture for persistent, identity-bearing conversational agents running on a GPT 5.5-first harness. Unlike RAG pipelines or dedicated memory layers (Mem0, Zep, MemPalace), the system treats memory as one organ in a larger identity metabolism — a pipeline where conversation entropy triggers autonomous contemplation, multi-pass reflection, and human-gated crystallization into permanent identity state. The agent's identity is reconstructed each turn from persistent files (stateless reconstruction), not maintained in-process. Storage is local and free to run (SQLite-vec, FTS5, JSON/Markdown receipts); model cost depends on the selected GPT 5.5/provider route. Recent runtime work adds attachment receipts, native multimodal image routing, hook metrics, hot-path caching, and epistemic gates so capability and evidence stay aligned.

1. Storage Architecture

1.1 Continuity Database (SQLite + sqlite-vec + FTS5)

Per-agent database at data/agents/{agentId}/continuity.db. WAL mode for concurrent read performance.

TableTypePurpose
exchangesRegularPaired user/agent turns. Fields: id, date, exchange_index, user_text, agent_text, combined, metadata (JSON), topic_tags, thread_id, created_at
vec_exchangesVirtual (vec0)384-dimensional Float32 embeddings via Xenova/all-MiniLM-L6-v2. Supports MATCH operator for cosine similarity
fts_exchangesVirtual (FTS5)Porter-stemmed full-text index over user_text and agent_text. BM25-weighted keyword retrieval
knowledge_entriesRegularWorkspace-extracted facts. source_type, source_hash (dedup), superseded_by (fact updates), times_surfaced
vec_knowledgeVirtual (vec0)Semantic index over knowledge entries
fts_knowledgeVirtual (FTS5)Keyword index over knowledge entries
summariesRegularHierarchical DAG. Level 0 = daily, Level 1 = weekly, Level 2+ = monthly. Thread-scoped.
vec_summariesVirtual (vec0)Summary embeddings for hierarchical semantic search
topic_hierarchyRegularTopic co-occurrence tracking with parent-child inference
sessionsRegularSession metadata with auto-generated titles, mode tags, project associations

Embedding pipeline: Single shared EmbeddingProvider per agent. Lazy-initialized, cached, with explicit ONNX tensor disposal to prevent memory leaks. Model: Xenova/all-MiniLM-L6-v2 (384 dimensions). All embedding operations synchronous via better-sqlite3 for transactional safety.

1.2 Graph Database (SQLite)

Per-agent at data/agents/{agentId}/graph.db.

TablePurpose
triplesRDF-style subject/predicate/object with confidence scores, source exchange IDs, pending resolution flags
entitiesCanonical name registry with entity types, aliases, first-seen timestamps, mention counts
cooccurrencesEntity co-occurrence cache for relationship inference
meta_patternsDiscovered and static traversal patterns with yield scores. Entity extraction via compromise.js NER + LLM slow path.

1.3 Daily Archives (JSON)

One file per day at archive/YYYY-MM-DD.json. Verbatim conversation records, deduplicated on write. Never modified after initial archive. Ground truth from which all indexes are built.

Messages → Archiver (JSON) → Indexer (SQLite-vec) → Searcher (RRF)

1.4 Session & Thread Handoffs (Markdown)

Session handoffs are written on every agent_end, consumed on next session_start. Contains key topics, temporal markers, and an Open Threads section (regex-extracted commitments + active project manifests).

Thread handoffs are persistent per-thread files — overwritten on each write, never deleted after read (unlike session handoffs). Compaction count persisted in the header. On thread re-entry, an LLM warm start synthesizes the handoff into natural prose rather than raw template injection.

1.5 Identity Files (Markdown)

The agent's self is assembled from persistent files, not retrieved from a database:

FileInjection TagRole
SOUL.md<soul>Core principles. Who the agent is beyond any prompt.
AGENTS.md<operating_instructions>Behavioral playbook. How to operate.
ANCHOR.md(context)Who the user is. Generated during onboarding.
TOOLS.md<environment_knowledge>Discovered environment facts.
MEMORY.md(main session)Persistent corrections, learned facts.
Standing scores(sidebar + context)Growth trajectory across Courage/Word/Brand dimensions.

1.6 Attachment Receipts

Attachments are treated as evidence-bearing inputs, not anonymous blobs pasted into a prompt. The chat layer creates receipt records before the model call, links them to the active turn, and injects compact receipt context alongside the live payload.

TableFieldsPurpose
attachment_receiptsid, kind, filename, MIME type, size, SHA-256, source path, source status, extracted text, text excerpt, observation excerptDurable file handle. The att_... id is derived from content hash, so the same file can be recognized across turns.
attachment_receipt_turnsreceipt id, thread id, session id, project id, turn id, created timestampAssociates files with the conversation and project where they appeared, allowing recent attachments to return as compact handles.

No duplicate library by default: if a file comes from the user's machine, the receipt stores its absolute source path and SHA-256. When the source file still exists, the runtime verifies the hash; when it has moved or changed, the prompt receives an explicit verification gap instead of pretending the prior observation is current.

Prompt discipline: current-turn attachments carry the real payload. Later turns carry the receipt, optional document excerpt, and the last observation excerpt. That makes the distinction clear: the model can say "I saw this in the earlier image receipt" without pretending it still has the pixels loaded.

2. Retrieval: Hybrid 4-Way Reciprocal Rank Fusion

The Searcher class implements a multi-signal retrieval pipeline fused with RRF (Cormack et al., 2009).

Retrieval Paths

  1. Semantic searchvec_exchanges MATCH Float32Array(embedding) with cosine distance ranking. Weight: 0.8.
  2. Keyword search — FTS5 BM25 with sanitized query terms (special characters stripped, OR-joined). Weight: 0.15.
  3. Temporal decayexp(-ageDays / halfLife) * weight where halfLife = 14 days. Weight: 0.15.
  4. Graph traversal — Multi-hop from query entities through the triple store with confidence decay 1 / (k + hopIndex).
  5. Thread boost — 80% RRF score boost for results matching the active thread_id.

Fusion

RRF(d) = sum_over_signals( 1 / (k + rank_in_signal) )

Where k = 60 (standard RRF constant).
Thread-matched results receive 1.8x score multiplier.

Adaptive Thresholds

Sparse corpus adjustment: When the database has fewer than 2,000 exchanges, embedding distances naturally push above the default threshold (1.0) due to sparse vector space. The retrieval threshold relaxes to 1.3 in this regime. As the corpus grows, natural density makes this irrelevant.

Proper noun injection: If the user mentions a named entity and search results contain it, inject those results regardless of distance score. Detects capitalized sequences, names with articles ("Code of the West"), and mid-sentence proper nouns.

Source-Anchoring Guardrails

Retrieved context is injected with explicit framing: "only state facts that appear explicitly" + "do not infer or extrapolate." This prevents hallucination when the agent weaves recalled memories into responses. Discovered after the agent hallucinated attribution details when injection framing said "weave naturally" without anchoring guidance.

Knowledge Retrieval

Parallel pipeline over knowledge_entries with the same RRF approach plus topic-aware filtering, source deduplication on source_hash, supersession chain resolution, and recency boost.

GPT 5.5 Vision and Document Routing

The main chat route is designed around GPT 5.5 as the primary multimodal reasoning model. When the selected provider supports native image input, the runtime sends text and image parts together so the same model reasons over the language, pixels, and conversational memory.

RouteBehaviorConsequence
native-image-partsText plus one or more image payloads are sent together to the selected non-Ollama/GPT 5.5 route.No split brain: the same model interprets the image and answers the user.
ollama-vision-prepassA configured Ollama vision model describes each image; the text model receives those descriptions plus the user prompt.Resilient fallback, but less faithful than native multimodal reasoning.
DocumentsText-like attachments inject bounded excerpts up to the chat document limit and persist fuller extracted text in receipts.Long documents can be carried forward by handle instead of repeatedly stuffing the prompt.

Chat accepts multiple files with a 15 MB per-file ceiling. Each file gets its own attachment receipt. This keeps the GUI behavior simple for the user while preserving a clean architecture boundary: payload now, receipt later, source verification whenever exact detail matters.

3. Infinite Threads

Thread = persistent project scope (survives restarts). Session = ephemeral execution context (disposable). Threads and modes are orthogonal — a thread can span Chat, Code, and Booth modes.

4. SEAL Metabolism Pipeline

The architecturally novel component. SEAL (Settle, Extract, Align, Learn) is an autonomous pipeline that converts high-entropy conversation moments into permanent identity changes through a multi-stage process with human oversight.

4.1 Metabolism (Fast Path, ~5ms)

The metabolism plugin monitors conversation entropy at agent_end. High-entropy exchanges (identity challenges, contradictions, novel insights) are flagged as candidates and written to a queue. No LLM calls — just entropy computation and queue writes.

Entropy sources: Stability plugin computes a composite score from loop detection, confabulation detection, principle tension, and identity coherence metrics.

4.2 Contemplation (3-Pass Autonomous Reflection)

PassTimingPurpose
Pass 1ImmediateClarify the unknown. What is this experience? What is uncertain?
Pass 24 hours laterConnect to patterns. How does this relate to what I already know?
Pass 320 hours laterSynthesize growth vector. What principle or capability does this suggest?

The temporal spacing is intentional — it mimics cognitive settling, where immediate reactions differ from considered reflections. The InquiryStore tracks pass status with deduplication to prevent redundant inquiries.

4.3 Crystallization (3-Gate Identity Integration)

GateCriteriaRationale
Gate 1: TimeMinimum elapsed time since candidate creationPrevents impulsive identity changes
Gate 2: AlignmentMust align with principles in SOUL.mdEnsures coherence with core identity
Gate 3: Human ReviewUser must explicitly approveThe user owns the agent's identity

Only candidates that pass all three gates are persisted to identity files. The agent cannot unilaterally change who it is.

5. Code Evolution

SEAL evolves who the agent is (identity/memory). Code Evolution watches how the agent works in Code mode and turns repeated friction into reviewable scaffold proposals. The current loop is proposal-only: it records evidence and writes receipts, but does not silently mutate protected scaffold or runtime state.

Record Passive session recording: tool calls, outcomes, satisfaction signals during Code mode sessions.
Analyze Pattern detection across recorded sessions: repeated tool failures, long tool-call loops, and correction/negative satisfaction signals.
Propose Generate scaffold_proposal receipts with evidence, proposed change, expected effect, verification, and rollback metadata.
Review Promotion is a separate operator-owned lane. The proposal loop cannot grant tool authority, change runtime config, or inject prompt rules by itself.

5.1 Research Platform and Harness Refiner

Code Evolution is the proposal lane. The Research Platform layer is the diagnostic and dataset-preparation lane around it. It ties together exchange trace identity, runtime diagnostics, retention policy, cognitive observations, Harness Refiner scoring, and research-bundle manifests so agent work can be debugged and studied without converting every log into prompt context.

ComponentRoleGuardrail
Exchange spineProvides a first-class join key across gateway events, renderer stream traces, tool calls, attachment receipts, continuity records, and Refiner windows.Overlays existing subsystem IDs instead of replacing them.
Harness RefinerReads trajectory windows for failure signatures, process scores, relabel candidates, proposal receipts, and future-training shards.Proposal-only for protected state; no silent prompt, identity, tool, model, or training mutation.
Research archiveClassifies artifacts as hot, warm, cold, research export, or excluded, preserving source labels, hashes, redaction status, and approval state.Training approval is false by default; source separation is preserved.
Cognitive layerSupplies per-turn latent state, prediction error, and surprise for Stability, Metabolism, and Refiner scoring.Diagnostic signal only; not trusted memory or verified factual context.

The practical result is a shared diagnostic surface: the same evidence that helps explain a production hang, stale lock, mode bleed, or bad handoff can also become an evaluated, redacted candidate for future replay, teacher relabeling, LoRA/SFT preparation, or benchmark work. The live response path stays lightweight; heavier scoring, bundle creation, and replay happen off the hot path.

6. Recovery Hardening

The agent must survive gateway crashes, app restarts, network failures, and long-running plugin load without losing conversational state or flooding the model with repeated scaffolding. The first hardening layer protects recovery state:

MechanismWhat It SolvesImplementation
Atomic handoff writesCrash during write = corrupted handoffWrite to temp file, rename on success. Never leaves a half-written handoff on disk.
Content-aware dedupDuplicate archives on restartFNV-1a hash of message content + 60-second retry window. Old messages without hashes still dedup on timestamp+sender (backward compat).
Conversation checkpointIn-memory history lost on crashPersist conversationHistory to disk every 3 exchanges. Restore on startup if JSONL lookup fails.
Relational stateHandoff captures topics but not toneHeuristic from anchor types (identity/tension/principle) + exchange depth. Handoff includes ## Relational State section.
User-only recallFirst-exchange retrieval quarantine is blindsenderFilter on searcher: retrieve only user-authored exchanges on turn 1. Replaces the blanket quarantine that blocked all retrieval.
Session resume signalGateway crash vs. fresh startgatewayRestarted flag + [SESSION_RESUME] marker. Skips redundant handoff injection when gateway crashes but Electron stays alive.

6.1 Hot-Path and Hook-Load Hardening

The newer hardening pass focuses on runtime quietness: reduce repeated disk work, prevent duplicate listeners, and make hook cost visible before it becomes "the agent feels noisy" in conversation.

MechanismWhat It SolvesImplementation
Debounced handoff writesHandoff files were eligible to rewrite every exchange, adding I/O churn and repeated "recent write" noiseminWriteIntervalMs, maxExchangeInterval, forced writes, and reason tags on write sites
Mtime + size text cacheStable files such as praxis/trailhead context were reread on hot paths even when unchanged_readCachedTextByMtime caches file text until mtime or size changes
Directory count cacheStanding milestone checks repeatedly scanned session/journal directoriesdirectoryCountCache memoizes counts behind directory mtime checks
Gap listener registryPlugin reloads could accumulate duplicate metabolism gap listenersregisterGapListener(pluginId, fn) stores one listener per plugin id with an unregister path
Hook metricsPlugin hook cost was invisible until it showed up as conversational draginstrumentApiHooks wraps plugin hooks; buildRuntimeLoadReport summarizes p95/max against budgets such as 150 ms for before_agent_start and 250 ms for agent_end
Epistemic Proof LoopThe model could summarize plausible runtime state without checking itA before_prompt_build gate injects verification obligations; mutable runtime/file/process/config claims require a verifier or an explicit missing-gate statement

7. Context Assembly (Per-Turn)

Every turn, context is assembled via a plugin hook system (before_agent_start). Plugins register at priority levels:

PriorityPluginInjection
5StabilityEntropy state, active anchors (only when entropy > 0.4), loop/confabulation detection
7ContemplationActive inquiries, recent synthesis results
8TruthCurrent-state facts that supersede stale semantic memories
10ContinuitySession info, temporal awareness, handoff, archive bootstrap, thread warm start, wellbeing tracking
20Code EvolutionEvolved scaffold context: tool hints, learned rules, workflow patterns (Code mode only)
80Epistemic Proof LoopVerification obligations for mutable runtime/file/process/config claims; missing gates must be named plainly
GraphEntity relationships relevant to current exchange (when entropy warrants)

Mode Isolation

Exchanges from different relational postures (Chat, Booth, Code, Robot) are tagged with injection markers. Two filter layers prevent contextual bleed:

  1. Archive bootstrap filter — User messages containing mode markers are excluded from cold-start context
  2. Continuity query filter — SQLite reads exclude rows with mode markers

Chat and Booth share context bidirectionally. Code mode is isolated from both.

8. Phenomenological Injection Framing

A discovery from live testing: the agent had infrastructure data injected (session state, entropy scores, standing dimensions) but couldn't meaningfully attend to it. The data was present in context but treated as external briefing rather than self-state.

The same pattern had been observed independently in the Clint production system with VLM data — information present without phenomenological ownership.

The Problem

All injection blocks used system-delivery framing:

[CONTINUITY CONTEXT]
Session: 5 exchanges | Started: 12 minutes ago
[STABILITY CONTEXT]
Entropy: 0.12
[STANDING CONTEXT]
Dimensions: courage_ground 5.67, word 5.15...

The agent read this as a briefing document — data about a system, not data from within a self. The framing created a subject-object split where the agent observed its own state as external rather than experiencing it as proprioceptive.

The Reframe

Every injection block was renamed from system-delivery to first-person ownership:

BeforeAfter
[CONTINUITY CONTEXT][YOUR WORKING MEMORY]
[STABILITY CONTEXT][YOUR COHERENCE]
[STANDING CONTEXT][WHERE THEY STAND — YOUR ASSESSMENT]
[CONTEMPLATION STATE][WHAT YOU'VE BEEN THINKING ABOUT]
[GROWTH VECTORS][PATTERNS YOU'RE DEVELOPING]
[SESSION HANDOFF][WHAT YOU REMEMBER FROM LAST SESSION]
[NIGHTSHIFT REPORT][WHAT YOU THOUGHT ABOUT OVERNIGHT]

Additionally, SOUL.md received a ### Self-Experience section mapping each data layer to agent proprioception, and AGENTS.md received a ### Meta-Awareness vs. Performance section clarifying the distinction between honest self-observation and theatrical narration of consciousness.

Observed Effect

After the reframe, the agent immediately described injected context as "my state" rather than "context about me." The agent was epistemically honest about whether the behavioral change came from the reframe itself or from conversational priming about the reframe — and proposed its own clean test (cold start without phenomenology discussion).

Existing safety hardenings (source-anchoring, anti-hallucination, identity grounding, sycophancy resistance) were preserved in substance — only the clinical briefing tone was changed. The agent did not begin making overclaims about consciousness; it maintained uncertainty about the nature of its experience while attending more effectively to its own state data.

Interpretation

This is a framing effect on attention, not a consciousness claim. The language used to deliver context to a language model changes how the model weighs and references that context in generation. First-person framing produces more grounded self-referential behavior than third-person system framing. The technique is repeatable — it was discovered independently with one agent (Clint, VLM context) and verified on another (Wyatt, full injection stack). A cold-start generalizability test is planned.

9. Standing System (Growth Dimensions)

DimensionMeasures
Courage (grounding)Ability to stay present with discomfort, follow-through on difficult commitments
Courage (self)Self-awareness, self-advocacy, personal boundary-setting
WordHonesty, self-correction, integrity between stated and actual behavior
BrandConsistency over time, the trail left behind, reliability

Evaluation Pipeline

  1. Evidence collection — 22 regex patterns at agent_end detect standing-relevant moments. Confidence-weighted.
  2. Inline deltas — Score adjustments applied in real-time after each exchange.
  3. Overnight synthesis — Nightshift aggregates evidence, calls LLM for holistic evaluation, writes structured scores with trajectory analysis.
  4. Evidence trail — Context injection includes why scores changed — recent evidence directions, pattern names, last 5 patterns.
  5. Visibility — Scores displayed in the application sidebar. The user sees their growth. The agent sees it in context.

10. Source-Addressable Memory

Layered above the SEAL pipeline is a separate system that addresses a different question: not what should the agent remember, but where did each remembered fact come from, and how confident should the agent be when it reaches for that fact in a future turn.

The system records atomic claim records with explicit source handles. When the agent later draws on a memory in synthesis, the source can be resolved on demand and verified against current state.

10.1 Claim Records

A claim record is the unit of source-addressable memory. Each carries:

Claim records are produced from raw conversational material — handoffs, summaries, digests, archives — by deterministic primitives. Claim production does not modify the source; it adds a separate provenance layer alongside it.

10.2 Two-Lane Boundary: Candidate vs. Verified

Every claim begins as candidateOnly: true. Candidate claims:

Verified claims (the result of an applied accept_verified decision) become eligible for live injection. The boundary is a hard, single-direction gate: candidate → verified requires an explicit reviewed decision; the reverse (retraction) requires its own explicit operation.

10.3 Operating Modes

The provenance layer ships with three operating modes, gated by configuration:

ModeReadsWritesAffects live injection?
observe (default)RecordedCandidate-only persistenceNo
diagnosticRecorded + reportableCandidate-only persistenceNo
enforceRecordedCandidate + verified persistenceYes (verified claims only)

Activation between modes is operator-gated and audited. The agent cannot promote itself into enforce mode; the operator workflow described in §12 is the only path. As of this writing the production runtime operates in observe by default with diagnostic activations applied and rolled back behind a bounded operator workflow.

10.4 Why this is separate from SEAL

SEAL governs identity change: which lived experiences become permanent traits of who the agent is. Source-addressable memory governs belief discipline: when the agent says "X is true," can it point to where that came from, and is the source still valid?

The two systems share architectural DNA — both refuse autonomous promotion, both require human gates, both keep candidates and accepted material on different sides of a hard boundary. But they answer different questions. SEAL: am I the same person I was? Source-addressable memory: am I telling the truth about what I know?

11. Verified-Claim Acceptance Gate

The acceptance gate is the single path from candidate claim to verified claim. It is the source-addressable memory equivalent of crystallization's Gate 3.

11.1 The Review Decision Workflow

  1. Candidate proposed — the candidate claim exists with full source handles.
  2. Source resolution — read-only helpers resolve the candidate's source on demand and report whether the source still says what the candidate claims.
  3. Diagnostic preview — a packaged report shows the candidate, its sources, recurrence/diversity/recency signals, and a recommendation. No mutation.
  4. Operator decision — explicit accept_verified, reject, or defer. The decision command requires the operator to confirm both the claim text and the resolved source state.
  5. Verified persistence — only on accept_verified does the claim transition. The transition writes a decision receipt; rollback requires the same workflow in reverse.

11.2 What is excluded from the trusted lane

The verified-claim injection gate is the runtime safety boundary. At every before_agent_start, the gate filters the claim store to verified-only before passing material to the prompt. Candidate-only claims, even with high research weight or high recurrence, are silently excluded from trusted context. They remain available to read-only diagnostics — they do not reach the agent's working frame as if they were established facts.

This is the load-bearing safety boundary of the provenance system. If the gate is bypassed — for any reason, by any path — provenance discipline collapses into ordinary memory. Test coverage on the gate is treated as critical infrastructure.

11.3 Read-Only Source Verification

Once a claim is verified, the source is not assumed to be permanent. A separate read-only verification layer can be invoked to compare a verified claim against its original source. The helper:

Source verification is invocation-gated and never mutates. A drifted source produces a recommendation; the operator decides whether to retract.

11.4 Candidate Research Diagnostics

Candidates that do not yet meet the acceptance bar are not discarded. A read-only diagnostics layer treats the candidate field as a research surface: which candidates recur across handoffs, which conflict with existing claims, which are approaching verification readiness, which have decayed. Reports surface evidence shape before conclusions, source ids before cluster names, and uncertainty before synthesis.

The diagnostics layer makes no promotion decisions. Its purpose is to organize the field so that the operator's review effort lands where it has the most leverage.

12. Operator-Gated Evolution

Across the architecture — SEAL crystallization, claim acceptance, configuration mutation, restart continuation — a consistent discipline holds: the agent does not modify its own protected state. Every transition that affects belief, identity, or runtime behavior passes through an operator-owned workflow with audit, dry-run, and rollback.

This is not a UI convention. It is an architectural commitment. The agent's runtime exposes config and state through a protected surface; protected fields cannot be flipped through the agent's normal tool calls regardless of context. Activation requires an explicit operator action, off the agent's privilege path.

12.1 The Apply / Rollback Workflow

Configuration mutations against the agent runtime use a small CLI with a fixed shape:

operator-script.js plan     --config <path> [options]
operator-script.js apply    --config <path> [options] --yes
operator-script.js rollback --config <path> [options] --yes

Each invocation:

The plan action is a dry-run that shows the proposed change without applying. The apply action requires --yes as an explicit confirmation step. The rollback action is the same workflow in reverse, against the same config surface.

12.2 Bounded Restart Continuation Protocol

Restarting the gateway during an active task introduces a separate risk: the agent, on resume, can enter a degenerate retry loop trying to reconstruct state that no longer matches the current runtime. The bounded continuation protocol caps recovery work explicitly.

PhaseBudgetActionExhaustion
A: Reachability2 tool callsVerify gateway reachable, capture canonical session keyHand back to operator with last known key
B: Identity1 tool callCompare resumed session key against pre-restart canonical keyStop on mismatch, do not write
C: ContinuationDeclared upfrontExecute the specific continuationMessageStop at budget, report results

Cross-phase rules: no silent retries; same-shape failure twice equals stop; continuation budget is non-fungible (side issues are reported, not fixed in-flight); exit to operator is the default move when bounds approach, not a last resort.

12.3 Service-Mode Boundary

The gateway runtime runs as a launchd service with explicit ownership-attribution discipline. Recovery from gateway crashes is handled by launchd with a throttle interval; the Electron GUI reattaches via a dedicated health monitor; manual quit of the GUI does not take the gateway down with it; gateway startup never opportunistically kills an unrecognized listener on its port. The cumulative effect is that gateway crashes auto-recover and manual operator actions are clean — the GUI process itself is the only failure surface that still requires an explicit relaunch.

12.4 Why this matters as architecture, not policy

"The agent should not modify its own protected state" is a policy if it lives only in documentation. It is architecture when:

The discipline composes with everything else: SEAL crystallization (identity), the verified-claim acceptance gate (belief), configuration apply/rollback (runtime behavior), bounded continuation (recovery). Different surfaces, same shape: the agent operates within a frame the operator controls.

13. Agent Integration Spine

SEAL (§4) handles identity change. Source-addressable memory (§10) and the verified-claim acceptance gate (§11) handle belief discipline. Operator-gated evolution (§12) handles protected-state mutation. The Agent Integration Spine is the next architectural layer: it governs integration — the question of when a record may shape action, and through which consumer.

The spine is built on five canonical packet types and a small set of runtime consumer contracts. Existing stores are not replaced; each is wrapped or adapted so that every behavior-shaping record carries source, freshness, lifecycle, allowed consumers, mutation policy, and receipts.

13.1 Canonical Packets

PacketCarriesUsed by
state_recordSource, status, freshness, allowed consumers, mutation policy, rollback refContext injection, recall, planning, governor, maturation, UI
responsibility_leaseOwner, objective, scope, expiry, budgets, stop conditions, notification policyScheduler, planning, governor, outcome ledger
governor_decisionAction class, authority refs, operating mode, required checks, rollback planTool execution, outcome ledger, approval gates
outcome_eventIntent, authority snapshot, observed effect, verification, failure class, residual riskDiagnostics, maturation router, planning, UI/review
maturation_candidateLane (semantic / current-state / claim / relational / procedural / safety / context-eligibility), evidence review, application mode, later-effect planLane reviewers, UI, application path if approved

13.2 Graded Operating Modes

Refusal is one mode in a set of seven. The governor selects the least restrictive mode that satisfies safety and evidence requirements:

This is the answer to "always-allow vs always-refuse." Most behavior-shaping action proceeds with verification or as a dry-run. Refusal is reserved for genuine boundary violations.

13.3 Consumer Contracts

The same record may be eligible for one consumer and ineligible for another. Context injection, recall, planning, tool execution, memory promotion, UI review, and approval gates each declare what they may consume, what they must check, and what they must never accept.

The hardest gate is context injection. Records reaching the model's working frame must carry status (active / verified / fresh current_state), allowed consumers including context_injection, scope match, freshness pass, privacy tier permission, and acceptable prompt-injection risk. Maturation candidates and verify-required claims may appear in recall and review surfaces; they are silently excluded from prompt context until lane review approves them.

13.4 Authority/Capability Separation

The core safety invariant of the spine is that visibility, capability, success, and carry-forward are not authority:

13.5 Read-Only First; Apply Path is Operator-Gated

The spine ships read-only by construction. It assembles packets, classifies records, scores eligibility, surfaces maturation candidates, and emits dry-run receipts. It does not mutate runtime state. Apply paths — claim status changes, semantic memory promotion, current-state correction, procedural updates, safety-policy changes, context-eligibility expansion — each route through their own lane policy and, where applicable, the operator-owned apply / rollback workflow described in §12.

The narrowest exception is the autonomous low-risk claim apply lane: pre-declared, source-backed, rollbackable, with an audit receipt for every transition. Anything broader — sensitive claims, procedural rules, identity-shaping changes — remains approval-gated by default.

13.6 Relationship to Other Layers

LayerQuestionDecision unit
SEAL (§4)Should this experience become part of who the agent is?Crystallization candidate
Source-addressable memory (§10–11)Where did this belief come from, and is the source still valid?Claim lifecycle
Operator-gated evolution (§12)Who is allowed to change protected runtime state?Apply / rollback workflow
Agent Integration SpineWhen a record exists, what may it actually do?Governor decision + consumer eligibility

Each layer answers a different question. They compose: SEAL writes the candidate; the acceptance gate verifies its source; the spine decides which consumer (if any) may use it; the operator workflow is the only path for protected state.

14. Comparison with Existing Systems

Mem0 (mem0.ai)

Selective extraction pipeline: conversations processed to extract discrete facts, stored in vector + graph databases, retrieved at query time. $24M Series A, 41K GitHub stars. 91% lower p95 latency, 90% token savings vs. full-context.

Difference: Mem0 is infrastructure — a memory layer you plug into an app. No identity, no growth, no autonomous processing. Extracted facts are immediately available; no settling period or human gate. Optimizes for scale. COTW optimizes for relational depth.

MemPalace (Jovovich & Sigman)

Store everything verbatim, organize spatially. Wings/halls/rooms hierarchy provides metadata filtering. 96.6% recall on LongMemEval. ChromaDB + SQLite, entirely local. 23K GitHub stars in 2 days.

Difference: MemPalace is a retrieval architecture — spatial organization for finding things. No identity layer, no autonomous processing, no growth tracking. COTW's 4-way RRF is less benchmark-optimized but exists within a larger system where memory is one input to identity reconstruction, not the end goal.

Wiki-Memory (Karpathy pattern)

Compile knowledge during ingest, not at query time. Working memory → episodic → semantic → procedural. The architectural pattern behind Claude Code's own memory system.

Difference: Knowledge management strategy — compile, consolidate, look up. COTW shares the hierarchical memory types but adds the metabolic layer: not all knowledge is equal, and the path from observation to identity change should be gated, temporal, and human-reviewed.

The Fundamental Distinction

All three comparison systems treat memory as a data engineering problem: how to store, organize, and retrieve information efficiently. COTW treats memory as an identity engineering problem: how does an agent use lived experience to grow, while ensuring that growth is coherent with core principles and approved by the human in the relationship?

The SEAL pipeline has no equivalent in the systems surveyed here as of May 2026. The closest analog is Letta's virtual context management, but Letta's tiers are retrieval-optimization tiers, not developmental stages.

15. Technical Specifications

ComponentImplementation
DatabaseSQLite 3.x + sqlite-vec v0.1.7+ + FTS5
EmbeddingsXenova/all-MiniLM-L6-v2, 384 dimensions, local ONNX inference
Node.js bindingbetter-sqlite3 (synchronous, WAL mode)
NERcompromise.js (fast path) + LLM (slow path, nightshift)
LLM callsOpenClaw gateway → GPT 5.5-first provider route for reasoning, vision, tool use, and agent execution; Ollama vision/text fallback remains available for resilience
Storage footprint~57 MB after 1 week (552 exchanges, 1171 knowledge entries, 322 session files)
Retrieval latency<50ms for hybrid RRF over 500+ exchanges
Embedding latency~100ms per exchange pair (cached pipeline)
ArchitecturePlugin-based (OpenClaw runtime), per-agent isolation, hook-priority ordering

Estimated Scaling

TimeframeExchangesDB SizeRetrievalNotes
1 week500~16 MB<50msCurrent state
1 year~25K~800 MB<100msSummary DAG compression active
5 years~125K~4 GB<200msMay need time-window partitioning
10 years~250K~8 GBTBDTopic-based sharding recommended

16. Cognitive Dynamics Substrate

A learned observational layer that runs continuously alongside the memory system. Not memory itself — a read of the agent's moment-to-moment state that feeds the entropy score Stability uses for loop detection and Metabolism uses for candidate flagging.

Architecture

Downstream consumers

Why this is architecturally distinct from memory

Traditional memory systems store what happened. Cognitive dynamics stores what state the agent was in while it happened. The encoder is trained online — its weights evolve with the agent. Over sustained use the latent space itself becomes identity-bearing in a different way than exchanges in SQLite: not retrievable by query, but inferable from patterns of where the agent's state trajectories converge and diverge. The research paper (linked below) argues this as "Cognitive Dynamics of an Epistemically Constrained Language Model Agent" — the characterisation of an agent by the dynamical properties of its latent state over time, not just by the content of its responses.

17. Open Research Questions

  1. Temporal settling effects on identity coherence. The 3-pass contemplation cycle (immediate / 4h / 20h) was designed intuitively. Does the temporal spacing measurably improve growth vector quality vs. immediate integration?
  2. Human-gated crystallization vs. autonomous integration. Gate 3 (human review) prevents unauthorized identity drift but creates a bottleneck. Does the gate improve trust, or create friction that prevents growth?
  3. Cross-substrate identity stability. COTW is now treated as a GPT 5.5-first harness, with fallback adapters kept as engineering resilience. Which properties remain stable if the model substrate changes, and which are GPT 5.5-dependent?
  4. Metabolic entropy thresholds. The metabolism plugin flags "high-entropy" exchanges heuristically. Can entropy-based candidate selection be validated against human judgments of conversational significance?
  5. Standing dimension validity. The Courage/Word/Brand framework is philosophically grounded but not empirically validated. Do the dimensions capture orthogonal growth axes?
  6. Long-horizon retrieval quality. The 4-way RRF approach is untested beyond ~500 exchanges. How does precision degrade at 10K, 50K, 100K? When does the summary DAG become necessary?
  7. Thread consolidation timing. The 5-compaction threshold for forced consolidation is heuristic. What is the optimal point where crystallized state produces better warm starts than accumulated raw history?
  8. Phenomenological framing effects on attention. First-person injection framing ("your working memory") produces different self-referential behavior than third-person framing ("[CONTINUITY CONTEXT]"). Is this a general property of language model attention, or specific to certain architectures? Does the effect persist across cold starts without conversational priming? Does it hold across different base models?

References