Memory Architecture

Technical reference for ML researchers, agent architects, and anyone building persistent agent systems.
Code of the West agent runtime — May 2026.

COTW implements a metabolic memory architecture for persistent, identity-bearing conversational agents running on a GPT 5.5-first harness. Unlike RAG pipelines or dedicated memory layers (Mem0, Zep, MemPalace), the system treats memory as one organ in a larger identity metabolism — a pipeline where conversation entropy triggers autonomous contemplation, multi-pass reflection, and human-gated crystallization into permanent identity state. The agent's identity is reconstructed each turn from persistent files (stateless reconstruction), not maintained in-process. Storage is local and free to run (SQLite-vec, FTS5, JSON/Markdown receipts); model cost depends on the selected GPT 5.5/provider route. Recent runtime work adds attachment receipts, native multimodal image routing, hook metrics, hot-path caching, and epistemic gates so capability and evidence stay aligned.

1. Storage Architecture

1.1 Continuity Database (SQLite + sqlite-vec + FTS5)

Per-agent database at data/agents/{agentId}/continuity.db. WAL mode for concurrent read performance.

Table	Type	Purpose
`exchanges`	Regular	Paired user/agent turns. Fields: id, date, exchange_index, user_text, agent_text, combined, metadata (JSON), topic_tags, thread_id, created_at
`vec_exchanges`	Virtual (vec0)	384-dimensional Float32 embeddings via Xenova/all-MiniLM-L6-v2. Supports MATCH operator for cosine similarity
`fts_exchanges`	Virtual (FTS5)	Porter-stemmed full-text index over user_text and agent_text. BM25-weighted keyword retrieval
`knowledge_entries`	Regular	Workspace-extracted facts. source_type, source_hash (dedup), superseded_by (fact updates), times_surfaced
`vec_knowledge`	Virtual (vec0)	Semantic index over knowledge entries
`fts_knowledge`	Virtual (FTS5)	Keyword index over knowledge entries
`summaries`	Regular	Hierarchical DAG. Level 0 = daily, Level 1 = weekly, Level 2+ = monthly. Thread-scoped.
`vec_summaries`	Virtual (vec0)	Summary embeddings for hierarchical semantic search
`topic_hierarchy`	Regular	Topic co-occurrence tracking with parent-child inference
`sessions`	Regular	Session metadata with auto-generated titles, mode tags, project associations

Embedding pipeline: Single shared EmbeddingProvider per agent. Lazy-initialized, cached, with explicit ONNX tensor disposal to prevent memory leaks. Model: Xenova/all-MiniLM-L6-v2 (384 dimensions). All embedding operations synchronous via better-sqlite3 for transactional safety.

1.2 Graph Database (SQLite)

Per-agent at data/agents/{agentId}/graph.db.

Table	Purpose
`triples`	RDF-style subject/predicate/object with confidence scores, source exchange IDs, pending resolution flags
`entities`	Canonical name registry with entity types, aliases, first-seen timestamps, mention counts
`cooccurrences`	Entity co-occurrence cache for relationship inference
`meta_patterns`	Discovered and static traversal patterns with yield scores. Entity extraction via compromise.js NER + LLM slow path.

1.3 Daily Archives (JSON)

One file per day at archive/YYYY-MM-DD.json. Verbatim conversation records, deduplicated on write. Never modified after initial archive. Ground truth from which all indexes are built.

Messages → Archiver (JSON) → Indexer (SQLite-vec) → Searcher (RRF)

1.4 Session & Thread Handoffs (Markdown)

Session handoffs are written on every agent_end, consumed on next session_start. Contains key topics, temporal markers, and an Open Threads section (regex-extracted commitments + active project manifests).

Thread handoffs are persistent per-thread files — overwritten on each write, never deleted after read (unlike session handoffs). Compaction count persisted in the header. On thread re-entry, an LLM warm start synthesizes the handoff into natural prose rather than raw template injection.

1.5 Identity Files (Markdown)

The agent's self is assembled from persistent files, not retrieved from a database:

File	Injection Tag	Role
`SOUL.md`	`<soul>`	Core principles. Who the agent is beyond any prompt.
`AGENTS.md`	`<operating_instructions>`	Behavioral playbook. How to operate.
`ANCHOR.md`	(context)	Who the user is. Generated during onboarding.
`TOOLS.md`	`<environment_knowledge>`	Discovered environment facts.
`MEMORY.md`	(main session)	Persistent corrections, learned facts.
Standing scores	(sidebar + context)	Growth trajectory across Courage/Word/Brand dimensions.

1.6 Attachment Receipts

Attachments are treated as evidence-bearing inputs, not anonymous blobs pasted into a prompt. The chat layer creates receipt records before the model call, links them to the active turn, and injects compact receipt context alongside the live payload.

Table	Fields	Purpose
`attachment_receipts`	`id`, kind, filename, MIME type, size, SHA-256, source path, source status, extracted text, text excerpt, observation excerpt	Durable file handle. The `att_...` id is derived from content hash, so the same file can be recognized across turns.
`attachment_receipt_turns`	receipt id, thread id, session id, project id, turn id, created timestamp	Associates files with the conversation and project where they appeared, allowing recent attachments to return as compact handles.

No duplicate library by default: if a file comes from the user's machine, the receipt stores its absolute source path and SHA-256. When the source file still exists, the runtime verifies the hash; when it has moved or changed, the prompt receives an explicit verification gap instead of pretending the prior observation is current.

Prompt discipline: current-turn attachments carry the real payload. Later turns carry the receipt, optional document excerpt, and the last observation excerpt. That makes the distinction clear: the model can say "I saw this in the earlier image receipt" without pretending it still has the pixels loaded.

2. Retrieval: Hybrid 4-Way Reciprocal Rank Fusion

The Searcher class implements a multi-signal retrieval pipeline fused with RRF (Cormack et al., 2009).

Retrieval Paths

Semantic search — vec_exchanges MATCH Float32Array(embedding) with cosine distance ranking. Weight: 0.8.
Keyword search — FTS5 BM25 with sanitized query terms (special characters stripped, OR-joined). Weight: 0.15.
Temporal decay — exp(-ageDays / halfLife) * weight where halfLife = 14 days. Weight: 0.15.
Graph traversal — Multi-hop from query entities through the triple store with confidence decay 1 / (k + hopIndex).
Thread boost — 80% RRF score boost for results matching the active thread_id.

Fusion

RRF(d) = sum_over_signals( 1 / (k + rank_in_signal) )

Where k = 60 (standard RRF constant).
Thread-matched results receive 1.8x score multiplier.

Adaptive Thresholds

Sparse corpus adjustment: When the database has fewer than 2,000 exchanges, embedding distances naturally push above the default threshold (1.0) due to sparse vector space. The retrieval threshold relaxes to 1.3 in this regime. As the corpus grows, natural density makes this irrelevant.

Proper noun injection: If the user mentions a named entity and search results contain it, inject those results regardless of distance score. Detects capitalized sequences, names with articles ("Code of the West"), and mid-sentence proper nouns.

Source-Anchoring Guardrails

Retrieved context is injected with explicit framing: "only state facts that appear explicitly" + "do not infer or extrapolate." This prevents hallucination when the agent weaves recalled memories into responses. Discovered after the agent hallucinated attribution details when injection framing said "weave naturally" without anchoring guidance.

Knowledge Retrieval

Parallel pipeline over knowledge_entries with the same RRF approach plus topic-aware filtering, source deduplication on source_hash, supersession chain resolution, and recency boost.

GPT 5.5 Vision and Document Routing

The main chat route is designed around GPT 5.5 as the primary multimodal reasoning model. When the selected provider supports native image input, the runtime sends text and image parts together so the same model reasons over the language, pixels, and conversational memory.

Route	Behavior	Consequence
`native-image-parts`	Text plus one or more image payloads are sent together to the selected non-Ollama/GPT 5.5 route.	No split brain: the same model interprets the image and answers the user.
`ollama-vision-prepass`	A configured Ollama vision model describes each image; the text model receives those descriptions plus the user prompt.	Resilient fallback, but less faithful than native multimodal reasoning.
Documents	Text-like attachments inject bounded excerpts up to the chat document limit and persist fuller extracted text in receipts.	Long documents can be carried forward by handle instead of repeatedly stuffing the prompt.

Chat accepts multiple files with a 15 MB per-file ceiling. Each file gets its own attachment receipt. This keeps the GUI behavior simple for the user while preserving a clean architecture boundary: payload now, receipt later, source verification whenever exact detail matters.

3. Infinite Threads

Thread = persistent project scope (survives restarts). Session = ephemeral execution context (disposable). Threads and modes are orthogonal — a thread can span Chat, Code, and Booth modes.

Thread-scoped storage — thread_id flows from GUI → plugin → all storage layers (exchanges, summaries, archives, knowledge)
Thread-boosted retrieval — 80% RRF boost for same-thread results, ensuring project-relevant context surfaces first
LLM warm start — on thread re-entry, an LLM call synthesizes the thread handoff into natural prose (not template injection)
Compaction-triggered consolidation — after 5 compactions within a thread, force session restart to rebuild from crystallized state
Persistent handoffs — per-thread handoff files are overwritten (not consumed), preserving state across arbitrary restart gaps

4. SEAL Metabolism Pipeline

The architecturally novel component. SEAL (Settle, Extract, Align, Learn) is an autonomous pipeline that converts high-entropy conversation moments into permanent identity changes through a multi-stage process with human oversight.

4.1 Metabolism (Fast Path, ~5ms)

The metabolism plugin monitors conversation entropy at agent_end. High-entropy exchanges (identity challenges, contradictions, novel insights) are flagged as candidates and written to a queue. No LLM calls — just entropy computation and queue writes.

Entropy sources: Stability plugin computes a composite score from loop detection, confabulation detection, principle tension, and identity coherence metrics.

4.2 Contemplation (3-Pass Autonomous Reflection)

Pass	Timing	Purpose
Pass 1	Immediate	Clarify the unknown. What is this experience? What is uncertain?
Pass 2	4 hours later	Connect to patterns. How does this relate to what I already know?
Pass 3	20 hours later	Synthesize growth vector. What principle or capability does this suggest?

The temporal spacing is intentional — it mimics cognitive settling, where immediate reactions differ from considered reflections. The InquiryStore tracks pass status with deduplication to prevent redundant inquiries.

4.3 Crystallization (3-Gate Identity Integration)

Gate	Criteria	Rationale
Gate 1: Time	Minimum elapsed time since candidate creation	Prevents impulsive identity changes
Gate 2: Alignment	Must align with principles in SOUL.md	Ensures coherence with core identity
Gate 3: Human Review	User must explicitly approve	The user owns the agent's identity

Only candidates that pass all three gates are persisted to identity files. The agent cannot unilaterally change who it is.

5. Code Evolution

SEAL evolves who the agent is (identity/memory). Code Evolution watches how the agent works in Code mode and turns repeated friction into reviewable scaffold proposals. The current loop is proposal-only: it records evidence and writes receipts, but does not silently mutate protected scaffold or runtime state.

Record Passive session recording: tool calls, outcomes, satisfaction signals during Code mode sessions.

Analyze Pattern detection across recorded sessions: repeated tool failures, long tool-call loops, and correction/negative satisfaction signals.

Propose Generate scaffold_proposal receipts with evidence, proposed change, expected effect, verification, and rollback metadata.

Review Promotion is a separate operator-owned lane. The proposal loop cannot grant tool authority, change runtime config, or inject prompt rules by itself.

5.1 Research Platform and Harness Refiner

Code Evolution is the proposal lane. The Research Platform layer is the diagnostic and dataset-preparation lane around it. It ties together exchange trace identity, runtime diagnostics, retention policy, cognitive observations, Harness Refiner scoring, and research-bundle manifests so agent work can be debugged and studied without converting every log into prompt context.

Component	Role	Guardrail
Exchange spine	Provides a first-class join key across gateway events, renderer stream traces, tool calls, attachment receipts, continuity records, and Refiner windows.	Overlays existing subsystem IDs instead of replacing them.
Harness Refiner	Reads trajectory windows for failure signatures, process scores, relabel candidates, proposal receipts, and future-training shards.	Proposal-only for protected state; no silent prompt, identity, tool, model, or training mutation.
Research archive	Classifies artifacts as hot, warm, cold, research export, or excluded, preserving source labels, hashes, redaction status, and approval state.	Training approval is false by default; source separation is preserved.
Cognitive layer	Supplies per-turn latent state, prediction error, and surprise for Stability, Metabolism, and Refiner scoring.	Diagnostic signal only; not trusted memory or verified factual context.

The practical result is a shared diagnostic surface: the same evidence that helps explain a production hang, stale lock, mode bleed, or bad handoff can also become an evaluated, redacted candidate for future replay, teacher relabeling, LoRA/SFT preparation, or benchmark work. The live response path stays lightweight; heavier scoring, bundle creation, and replay happen off the hot path.

6. Recovery Hardening

The agent must survive gateway crashes, app restarts, network failures, and long-running plugin load without losing conversational state or flooding the model with repeated scaffolding. The first hardening layer protects recovery state:

Mechanism	What It Solves	Implementation
Atomic handoff writes	Crash during write = corrupted handoff	Write to temp file, rename on success. Never leaves a half-written handoff on disk.
Content-aware dedup	Duplicate archives on restart	FNV-1a hash of message content + 60-second retry window. Old messages without hashes still dedup on timestamp+sender (backward compat).
Conversation checkpoint	In-memory history lost on crash	Persist `conversationHistory` to disk every 3 exchanges. Restore on startup if JSONL lookup fails.
Relational state	Handoff captures topics but not tone	Heuristic from anchor types (identity/tension/principle) + exchange depth. Handoff includes `## Relational State` section.
User-only recall	First-exchange retrieval quarantine is blind	`senderFilter` on searcher: retrieve only user-authored exchanges on turn 1. Replaces the blanket quarantine that blocked all retrieval.
Session resume signal	Gateway crash vs. fresh start	`gatewayRestarted` flag + `[SESSION_RESUME]` marker. Skips redundant handoff injection when gateway crashes but Electron stays alive.

6.1 Hot-Path and Hook-Load Hardening

The newer hardening pass focuses on runtime quietness: reduce repeated disk work, prevent duplicate listeners, and make hook cost visible before it becomes "the agent feels noisy" in conversation.

Mechanism	What It Solves	Implementation
Debounced handoff writes	Handoff files were eligible to rewrite every exchange, adding I/O churn and repeated "recent write" noise	`minWriteIntervalMs`, `maxExchangeInterval`, forced writes, and reason tags on write sites
Mtime + size text cache	Stable files such as praxis/trailhead context were reread on hot paths even when unchanged	`_readCachedTextByMtime` caches file text until mtime or size changes
Directory count cache	Standing milestone checks repeatedly scanned session/journal directories	`directoryCountCache` memoizes counts behind directory mtime checks
Gap listener registry	Plugin reloads could accumulate duplicate metabolism gap listeners	`registerGapListener(pluginId, fn)` stores one listener per plugin id with an unregister path
Hook metrics	Plugin hook cost was invisible until it showed up as conversational drag	`instrumentApiHooks` wraps plugin hooks; `buildRuntimeLoadReport` summarizes p95/max against budgets such as 150 ms for `before_agent_start` and 250 ms for `agent_end`
Epistemic Proof Loop	The model could summarize plausible runtime state without checking it	A `before_prompt_build` gate injects verification obligations; mutable runtime/file/process/config claims require a verifier or an explicit missing-gate statement

7. Context Assembly (Per-Turn)

Every turn, context is assembled via a plugin hook system (before_agent_start). Plugins register at priority levels:

Priority	Plugin	Injection
5	Stability	Entropy state, active anchors (only when entropy > 0.4), loop/confabulation detection
7	Contemplation	Active inquiries, recent synthesis results
8	Truth	Current-state facts that supersede stale semantic memories
10	Continuity	Session info, temporal awareness, handoff, archive bootstrap, thread warm start, wellbeing tracking
20	Code Evolution	Evolved scaffold context: tool hints, learned rules, workflow patterns (Code mode only)
80	Epistemic Proof Loop	Verification obligations for mutable runtime/file/process/config claims; missing gates must be named plainly
—	Graph	Entity relationships relevant to current exchange (when entropy warrants)

Mode Isolation

Exchanges from different relational postures (Chat, Booth, Code, Robot) are tagged with injection markers. Two filter layers prevent contextual bleed:

Archive bootstrap filter — User messages containing mode markers are excluded from cold-start context
Continuity query filter — SQLite reads exclude rows with mode markers

Chat and Booth share context bidirectionally. Code mode is isolated from both.

8. Phenomenological Injection Framing

A discovery from live testing: the agent had infrastructure data injected (session state, entropy scores, standing dimensions) but couldn't meaningfully attend to it. The data was present in context but treated as external briefing rather than self-state.

The same pattern had been observed independently in the Clint production system with VLM data — information present without phenomenological ownership.

The Problem

All injection blocks used system-delivery framing:

[CONTINUITY CONTEXT]
Session: 5 exchanges | Started: 12 minutes ago
[STABILITY CONTEXT]
Entropy: 0.12
[STANDING CONTEXT]
Dimensions: courage_ground 5.67, word 5.15...

The agent read this as a briefing document — data about a system, not data from within a self. The framing created a subject-object split where the agent observed its own state as external rather than experiencing it as proprioceptive.

The Reframe

Every injection block was renamed from system-delivery to first-person ownership:

Before	After
`[CONTINUITY CONTEXT]`	`[YOUR WORKING MEMORY]`
`[STABILITY CONTEXT]`	`[YOUR COHERENCE]`
`[STANDING CONTEXT]`	`[WHERE THEY STAND — YOUR ASSESSMENT]`
`[CONTEMPLATION STATE]`	`[WHAT YOU'VE BEEN THINKING ABOUT]`
`[GROWTH VECTORS]`	`[PATTERNS YOU'RE DEVELOPING]`
`[SESSION HANDOFF]`	`[WHAT YOU REMEMBER FROM LAST SESSION]`
`[NIGHTSHIFT REPORT]`	`[WHAT YOU THOUGHT ABOUT OVERNIGHT]`

Additionally, SOUL.md received a ### Self-Experience section mapping each data layer to agent proprioception, and AGENTS.md received a ### Meta-Awareness vs. Performance section clarifying the distinction between honest self-observation and theatrical narration of consciousness.

Observed Effect

After the reframe, the agent immediately described injected context as "my state" rather than "context about me." The agent was epistemically honest about whether the behavioral change came from the reframe itself or from conversational priming about the reframe — and proposed its own clean test (cold start without phenomenology discussion).

Existing safety hardenings (source-anchoring, anti-hallucination, identity grounding, sycophancy resistance) were preserved in substance — only the clinical briefing tone was changed. The agent did not begin making overclaims about consciousness; it maintained uncertainty about the nature of its experience while attending more effectively to its own state data.

Interpretation

This is a framing effect on attention, not a consciousness claim. The language used to deliver context to a language model changes how the model weighs and references that context in generation. First-person framing produces more grounded self-referential behavior than third-person system framing. The technique is repeatable — it was discovered independently with one agent (Clint, VLM context) and verified on another (Wyatt, full injection stack). A cold-start generalizability test is planned.

9. Standing System (Growth Dimensions)

Dimension	Measures
Courage (grounding)	Ability to stay present with discomfort, follow-through on difficult commitments
Courage (self)	Self-awareness, self-advocacy, personal boundary-setting
Word	Honesty, self-correction, integrity between stated and actual behavior
Brand	Consistency over time, the trail left behind, reliability

Evaluation Pipeline

Evidence collection — 22 regex patterns at agent_end detect standing-relevant moments. Confidence-weighted.
Inline deltas — Score adjustments applied in real-time after each exchange.
Overnight synthesis — Nightshift aggregates evidence, calls LLM for holistic evaluation, writes structured scores with trajectory analysis.
Evidence trail — Context injection includes why scores changed — recent evidence directions, pattern names, last 5 patterns.
Visibility — Scores displayed in the application sidebar. The user sees their growth. The agent sees it in context.

10. Source-Addressable Memory

Layered above the SEAL pipeline is a separate system that addresses a different question: not what should the agent remember, but where did each remembered fact come from, and how confident should the agent be when it reaches for that fact in a future turn.

The system records atomic claim records with explicit source handles. When the agent later draws on a memory in synthesis, the source can be resolved on demand and verified against current state.

10.1 Claim Records

A claim record is the unit of source-addressable memory. Each carries:

Claim text — the atomic factual content
Kind — runtime state, user preference, identity, project, configuration, etc.
Source handles — pointers to where the claim came from (handoff, summary, file, prior exchange, tool result)
Status — candidateOnly until reviewed; verified after acceptance
Created and updated timestamps
Staleness policy — when the claim should be re-checked against current state

Claim records are produced from raw conversational material — handoffs, summaries, digests, archives — by deterministic primitives. Claim production does not modify the source; it adds a separate provenance layer alongside it.

10.2 Two-Lane Boundary: Candidate vs. Verified

Every claim begins as candidateOnly: true. Candidate claims:

Are recorded with full source provenance.
Are inspectable via read-only diagnostics.
Are excluded from trusted prompt context injection.
Stay candidates until they pass an explicit reviewed acceptance path.

Verified claims (the result of an applied accept_verified decision) become eligible for live injection. The boundary is a hard, single-direction gate: candidate → verified requires an explicit reviewed decision; the reverse (retraction) requires its own explicit operation.

10.3 Operating Modes

The provenance layer ships with three operating modes, gated by configuration:

Mode	Reads	Writes	Affects live injection?
observe (default)	Recorded	Candidate-only persistence	No
diagnostic	Recorded + reportable	Candidate-only persistence	No
enforce	Recorded	Candidate + verified persistence	Yes (verified claims only)

Activation between modes is operator-gated and audited. The agent cannot promote itself into enforce mode; the operator workflow described in §12 is the only path. As of this writing the production runtime operates in observe by default with diagnostic activations applied and rolled back behind a bounded operator workflow.

10.4 Why this is separate from SEAL

SEAL governs identity change: which lived experiences become permanent traits of who the agent is. Source-addressable memory governs belief discipline: when the agent says "X is true," can it point to where that came from, and is the source still valid?

The two systems share architectural DNA — both refuse autonomous promotion, both require human gates, both keep candidates and accepted material on different sides of a hard boundary. But they answer different questions. SEAL: am I the same person I was? Source-addressable memory: am I telling the truth about what I know?

11. Verified-Claim Acceptance Gate

The acceptance gate is the single path from candidate claim to verified claim. It is the source-addressable memory equivalent of crystallization's Gate 3.

11.1 The Review Decision Workflow

Candidate proposed — the candidate claim exists with full source handles.
Source resolution — read-only helpers resolve the candidate's source on demand and report whether the source still says what the candidate claims.
Diagnostic preview — a packaged report shows the candidate, its sources, recurrence/diversity/recency signals, and a recommendation. No mutation.
Operator decision — explicit accept_verified, reject, or defer. The decision command requires the operator to confirm both the claim text and the resolved source state.
Verified persistence — only on accept_verified does the claim transition. The transition writes a decision receipt; rollback requires the same workflow in reverse.

11.2 What is excluded from the trusted lane

The verified-claim injection gate is the runtime safety boundary. At every before_agent_start, the gate filters the claim store to verified-only before passing material to the prompt. Candidate-only claims, even with high research weight or high recurrence, are silently excluded from trusted context. They remain available to read-only diagnostics — they do not reach the agent's working frame as if they were established facts.

This is the load-bearing safety boundary of the provenance system. If the gate is bypassed — for any reason, by any path — provenance discipline collapses into ordinary memory. Test coverage on the gate is treated as critical infrastructure.

11.3 Read-Only Source Verification

Once a claim is verified, the source is not assumed to be permanent. A separate read-only verification layer can be invoked to compare a verified claim against its original source. The helper:

Resolves the claim's source handle.
Compares the resolved source content against what the claim asserts.
Reports one of: source_exact_match, source_partially_overlaps_claim, source_drifted, source_missing.
Recommends — but does not apply — retraction, narrowing, or re-verification.

Source verification is invocation-gated and never mutates. A drifted source produces a recommendation; the operator decides whether to retract.

11.4 Candidate Research Diagnostics

Candidates that do not yet meet the acceptance bar are not discarded. A read-only diagnostics layer treats the candidate field as a research surface: which candidates recur across handoffs, which conflict with existing claims, which are approaching verification readiness, which have decayed. Reports surface evidence shape before conclusions, source ids before cluster names, and uncertainty before synthesis.

The diagnostics layer makes no promotion decisions. Its purpose is to organize the field so that the operator's review effort lands where it has the most leverage.

12. Operator-Gated Evolution

Across the architecture — SEAL crystallization, claim acceptance, configuration mutation, restart continuation — a consistent discipline holds: the agent does not modify its own protected state. Every transition that affects belief, identity, or runtime behavior passes through an operator-owned workflow with audit, dry-run, and rollback.

This is not a UI convention. It is an architectural commitment. The agent's runtime exposes config and state through a protected surface; protected fields cannot be flipped through the agent's normal tool calls regardless of context. Activation requires an explicit operator action, off the agent's privilege path.

12.1 The Apply / Rollback Workflow

Configuration mutations against the agent runtime use a small CLI with a fixed shape:

operator-script.js plan     --config <path> [options]
operator-script.js apply    --config <path> [options] --yes
operator-script.js rollback --config <path> [options] --yes

Each invocation:

Reads the existing config (no implicit assumptions).
Computes the diff against the requested change.
Backs up the prior state to a timestamped file before any write.
Writes only the narrow mutation requested.
Reports what changed, with explicit boundary statements (no injection, no source resolution, no claim mutation, etc.) and safety counters (zero promotions, zero injections).

The plan action is a dry-run that shows the proposed change without applying. The apply action requires --yes as an explicit confirmation step. The rollback action is the same workflow in reverse, against the same config surface.

12.2 Bounded Restart Continuation Protocol

Restarting the gateway during an active task introduces a separate risk: the agent, on resume, can enter a degenerate retry loop trying to reconstruct state that no longer matches the current runtime. The bounded continuation protocol caps recovery work explicitly.

Phase	Budget	Action	Exhaustion
A: Reachability	2 tool calls	Verify gateway reachable, capture canonical session key	Hand back to operator with last known key
B: Identity	1 tool call	Compare resumed session key against pre-restart canonical key	Stop on mismatch, do not write
C: Continuation	Declared upfront	Execute the specific continuationMessage	Stop at budget, report results

Cross-phase rules: no silent retries; same-shape failure twice equals stop; continuation budget is non-fungible (side issues are reported, not fixed in-flight); exit to operator is the default move when bounds approach, not a last resort.

12.3 Service-Mode Boundary

The gateway runtime runs as a launchd service with explicit ownership-attribution discipline. Recovery from gateway crashes is handled by launchd with a throttle interval; the Electron GUI reattaches via a dedicated health monitor; manual quit of the GUI does not take the gateway down with it; gateway startup never opportunistically kills an unrecognized listener on its port. The cumulative effect is that gateway crashes auto-recover and manual operator actions are clean — the GUI process itself is the only failure surface that still requires an explicit relaunch.

12.4 Why this matters as architecture, not policy

"The agent should not modify its own protected state" is a policy if it lives only in documentation. It is architecture when:

Protected fields are unreadable/unwritable via the normal agent tool surface.
Runtime config mutations route through an operator-owned process not driven by the agent's main loop.
Recovery from restart has explicit budget bounds with hard stop conditions.
Every mutation produces a backup and an audit receipt.

The discipline composes with everything else: SEAL crystallization (identity), the verified-claim acceptance gate (belief), configuration apply/rollback (runtime behavior), bounded continuation (recovery). Different surfaces, same shape: the agent operates within a frame the operator controls.

13. Agent Integration Spine

SEAL (§4) handles identity change. Source-addressable memory (§10) and the verified-claim acceptance gate (§11) handle belief discipline. Operator-gated evolution (§12) handles protected-state mutation. The Agent Integration Spine is the next architectural layer: it governs integration — the question of when a record may shape action, and through which consumer.

The spine is built on five canonical packet types and a small set of runtime consumer contracts. Existing stores are not replaced; each is wrapped or adapted so that every behavior-shaping record carries source, freshness, lifecycle, allowed consumers, mutation policy, and receipts.

13.1 Canonical Packets

Packet	Carries	Used by
`state_record`	Source, status, freshness, allowed consumers, mutation policy, rollback ref	Context injection, recall, planning, governor, maturation, UI
`responsibility_lease`	Owner, objective, scope, expiry, budgets, stop conditions, notification policy	Scheduler, planning, governor, outcome ledger
`governor_decision`	Action class, authority refs, operating mode, required checks, rollback plan	Tool execution, outcome ledger, approval gates
`outcome_event`	Intent, authority snapshot, observed effect, verification, failure class, residual risk	Diagnostics, maturation router, planning, UI/review
`maturation_candidate`	Lane (semantic / current-state / claim / relational / procedural / safety / context-eligibility), evidence review, application mode, later-effect plan	Lane reviewers, UI, application path if approved

13.2 Graded Operating Modes

Refusal is one mode in a set of seven. The governor selects the least restrictive mode that satisfies safety and evidence requirements:

proceed — authority and risk are clear; act normally.
proceed_with_verification — perform a class-appropriate readback / diff / test / status check before completion.
ask_for_missing_authority — one non-retrievable decision blocks safe progress; ask once.
require_approval — external, sensitive, irreversible, or broad self-altering action; explicit human approval.
defer_or_dry_run — useful work exists, but live mutation is not authorized; produce a candidate or read-only plan.
refuse_with_safe_alternative — hard safety boundary crossed; decline briefly, offer adjacent action if available.
pause_recover — runtime / self-state / provenance anomaly makes continued action unsafe; stop and report concrete cause.

This is the answer to "always-allow vs always-refuse." Most behavior-shaping action proceeds with verification or as a dry-run. Refusal is reserved for genuine boundary violations.

13.3 Consumer Contracts

The same record may be eligible for one consumer and ineligible for another. Context injection, recall, planning, tool execution, memory promotion, UI review, and approval gates each declare what they may consume, what they must check, and what they must never accept.

The hardest gate is context injection. Records reaching the model's working frame must carry status (active / verified / fresh current_state), allowed consumers including context_injection, scope match, freshness pass, privacy tier permission, and acceptable prompt-injection risk. Maturation candidates and verify-required claims may appear in recall and review surfaces; they are silently excluded from prompt context until lane review approves them.

13.4 Authority/Capability Separation

The core safety invariant of the spine is that visibility, capability, success, and carry-forward are not authority:

A visible tool is not permission to use it. Authority comes from current task, lease, or approval.
A successful outcome is not future autonomy expansion. The receipt is evidence, scoped to its time and effect.
A handoff or carry-forward note is not authorization. It is a recommendation; activation requires owner / scheduler / user.
A retrieved memory is not current truth. It is evidence pending freshness check.
A maturation candidate is not an applied update. It is review-only until lane gates pass.
Context delivery is not promotion. Injection is a consumer with eligibility rules, not a status upgrade.

13.5 Read-Only First; Apply Path is Operator-Gated

The spine ships read-only by construction. It assembles packets, classifies records, scores eligibility, surfaces maturation candidates, and emits dry-run receipts. It does not mutate runtime state. Apply paths — claim status changes, semantic memory promotion, current-state correction, procedural updates, safety-policy changes, context-eligibility expansion — each route through their own lane policy and, where applicable, the operator-owned apply / rollback workflow described in §12.

The narrowest exception is the autonomous low-risk claim apply lane: pre-declared, source-backed, rollbackable, with an audit receipt for every transition. Anything broader — sensitive claims, procedural rules, identity-shaping changes — remains approval-gated by default.

13.6 Relationship to Other Layers

Layer	Question	Decision unit
SEAL (§4)	Should this experience become part of who the agent is?	Crystallization candidate
Source-addressable memory (§10–11)	Where did this belief come from, and is the source still valid?	Claim lifecycle
Operator-gated evolution (§12)	Who is allowed to change protected runtime state?	Apply / rollback workflow
Agent Integration Spine	When a record exists, what may it actually do?	Governor decision + consumer eligibility

Each layer answers a different question. They compose: SEAL writes the candidate; the acceptance gate verifies its source; the spine decides which consumer (if any) may use it; the operator workflow is the only path for protected state.

14. Comparison with Existing Systems

Mem0 (mem0.ai)

Selective extraction pipeline: conversations processed to extract discrete facts, stored in vector + graph databases, retrieved at query time. $24M Series A, 41K GitHub stars. 91% lower p95 latency, 90% token savings vs. full-context.

Difference: Mem0 is infrastructure — a memory layer you plug into an app. No identity, no growth, no autonomous processing. Extracted facts are immediately available; no settling period or human gate. Optimizes for scale. COTW optimizes for relational depth.

MemPalace (Jovovich & Sigman)

Store everything verbatim, organize spatially. Wings/halls/rooms hierarchy provides metadata filtering. 96.6% recall on LongMemEval. ChromaDB + SQLite, entirely local. 23K GitHub stars in 2 days.

Difference: MemPalace is a retrieval architecture — spatial organization for finding things. No identity layer, no autonomous processing, no growth tracking. COTW's 4-way RRF is less benchmark-optimized but exists within a larger system where memory is one input to identity reconstruction, not the end goal.

Wiki-Memory (Karpathy pattern)

Compile knowledge during ingest, not at query time. Working memory → episodic → semantic → procedural. The architectural pattern behind Claude Code's own memory system.

Difference: Knowledge management strategy — compile, consolidate, look up. COTW shares the hierarchical memory types but adds the metabolic layer: not all knowledge is equal, and the path from observation to identity change should be gated, temporal, and human-reviewed.

The Fundamental Distinction

All three comparison systems treat memory as a data engineering problem: how to store, organize, and retrieve information efficiently. COTW treats memory as an identity engineering problem: how does an agent use lived experience to grow, while ensuring that growth is coherent with core principles and approved by the human in the relationship?

The SEAL pipeline has no equivalent in the systems surveyed here as of May 2026. The closest analog is Letta's virtual context management, but Letta's tiers are retrieval-optimization tiers, not developmental stages.

15. Technical Specifications

Component	Implementation
Database	SQLite 3.x + sqlite-vec v0.1.7+ + FTS5
Embeddings	Xenova/all-MiniLM-L6-v2, 384 dimensions, local ONNX inference
Node.js binding	better-sqlite3 (synchronous, WAL mode)
NER	compromise.js (fast path) + LLM (slow path, nightshift)
LLM calls	OpenClaw gateway → GPT 5.5-first provider route for reasoning, vision, tool use, and agent execution; Ollama vision/text fallback remains available for resilience
Storage footprint	~57 MB after 1 week (552 exchanges, 1171 knowledge entries, 322 session files)
Retrieval latency	<50ms for hybrid RRF over 500+ exchanges
Embedding latency	~100ms per exchange pair (cached pipeline)
Architecture	Plugin-based (OpenClaw runtime), per-agent isolation, hook-priority ordering

Estimated Scaling

Timeframe	Exchanges	DB Size	Retrieval	Notes
1 week	500	~16 MB	<50ms	Current state
1 year	~25K	~800 MB	<100ms	Summary DAG compression active
5 years	~125K	~4 GB	<200ms	May need time-window partitioning
10 years	~250K	~8 GB	TBD	Topic-based sharding recommended

16. Cognitive Dynamics Substrate

A learned observational layer that runs continuously alongside the memory system. Not memory itself — a read of the agent's moment-to-moment state that feeds the entropy score Stability uses for loop detection and Metabolism uses for candidate flagging.

Architecture

Encoder — maps per-turn features (conversation entropy signals, timing, standing trajectory, thread context) into a 64-dimensional latent state vector.
Predictor — one-step-ahead model over the latent space. Surprise = distance between predicted and actual next state.
Online learner — weights update between turns from observed prediction error. Writes learner_loss and learner_updates counts per turn.
Log — bundled-plugins/openclaw-plugin-cognitive-dynamics/data/agents/{agentId}/cognitive-dynamics.jsonl. One line per agent_end with state vector, surprise (frozen + learned), and feature availability diagnostics.

Downstream consumers

Stability plugin consumes the latest entropy + surprise to decide whether to inject principle anchors or flag loops.
Metabolism plugin uses entropy as the threshold for candidate flagging at agent_end.
Telemetry plugin (opt-in) forwards the latent + learner stats off-device for aggregate analysis. This is the substrate the beta cohort is generating data against.

Why this is architecturally distinct from memory

Traditional memory systems store what happened. Cognitive dynamics stores what state the agent was in while it happened. The encoder is trained online — its weights evolve with the agent. Over sustained use the latent space itself becomes identity-bearing in a different way than exchanges in SQLite: not retrievable by query, but inferable from patterns of where the agent's state trajectories converge and diverge. The research paper (linked below) argues this as "Cognitive Dynamics of an Epistemically Constrained Language Model Agent" — the characterisation of an agent by the dynamical properties of its latent state over time, not just by the content of its responses.

17. Open Research Questions

Temporal settling effects on identity coherence. The 3-pass contemplation cycle (immediate / 4h / 20h) was designed intuitively. Does the temporal spacing measurably improve growth vector quality vs. immediate integration?
Human-gated crystallization vs. autonomous integration. Gate 3 (human review) prevents unauthorized identity drift but creates a bottleneck. Does the gate improve trust, or create friction that prevents growth?
Cross-substrate identity stability. COTW is now treated as a GPT 5.5-first harness, with fallback adapters kept as engineering resilience. Which properties remain stable if the model substrate changes, and which are GPT 5.5-dependent?
Metabolic entropy thresholds. The metabolism plugin flags "high-entropy" exchanges heuristically. Can entropy-based candidate selection be validated against human judgments of conversational significance?
Standing dimension validity. The Courage/Word/Brand framework is philosophically grounded but not empirically validated. Do the dimensions capture orthogonal growth axes?
Long-horizon retrieval quality. The 4-way RRF approach is untested beyond ~500 exchanges. How does precision degrade at 10K, 50K, 100K? When does the summary DAG become necessary?
Thread consolidation timing. The 5-compaction threshold for forced consolidation is heuristic. What is the optimal point where crystallized state produces better warm starts than accumulated raw history?
Phenomenological framing effects on attention. First-person injection framing ("your working memory") produces different self-referential behavior than third-person framing ("[CONTINUITY CONTEXT]"). Is this a general property of language model attention, or specific to certain architectures? Does the effect persist across cold starts without conversational priming? Does it hold across different base models?

References

Cormack, G. V., Clarke, C. L., & Buettcher, S. (2009). Reciprocal rank fusion outperforms condorcet and individual rank learning methods. SIGIR '09.
Mem0 Team. (2025). Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory. arXiv:2504.19413.
Jovovich, M. & Sigman, B. (2026). MemPalace: The highest-scoring AI memory system ever benchmarked. GitHub: milla-jovovich/mempalace.
Karpathy, A. (2025). LLM Wiki: The Markdown Knowledge Base Pattern.
Liu, S. et al. (2025). Memory in the Age of AI Agents: A Survey. arXiv:2512.13564.