I inspected the uploaded repo and the current gap is very concrete:

src/llm.rs:84-103 and src/llm.rs:122-183 send non-streaming requests and return only extracted assistant text / parsed JSON.
src/llm.rs:214-235 discards response headers, raw response bytes, router target headers, and usage metadata.
src/llm.rs:250-256 caps error bodies at 2 KB for display, but there is no uncapped artifact storage elsewhere.
src/minds.rs:106-123 and src/minds.rs:126-142 immediately normalize successful perceive/intend text.
src/minds.rs:385-387 uses split_whitespace().join(" "), which destroys raw whitespace, line breaks, and chunk/token shape.
src/minds.rs:176-215 preserves adjudication raw text only for rejected retry attempts. Successful adjudication raw JSON is not preserved as a first-class artifact.
src/kernel.rs:594-701 stages semantic audit events in memory; the LLM artifacts are not independently durable while the call is running.
src/world_store/mod.rs:410-426 and migrations/0002_world_store.sql:36-54 show the attempt record is only status/timing/progress/failure/delta.
src/read_models.rs:1425-1457 builds attempt detail from the attempt record plus audit events. There is no LLM-call surface.
src/resource_catalog.rs:430-442 has the attempt list columns wired to completed_at, but the actual record field is ended_at, so even the existing attempts UI is under-serving the operator.

The current incident proves why this matters: the latest first-meeting attempt generated/evaluated 56596 completion tokens, hit 57344 total tokens, truncated, returned HTTP 200, and Chukwa retained only “empty assistant message” as the meaningful attempt-level result.

Below is the ticket I’d submit. It is intentionally declarative and maximal.

Ticket: Make LLM cognition traces first-class durable artifacts for every Chukwa turn attempt

Priority: P1 Type: feature Labels: observability, llm, persistence, attempts, ui, genetic_algorithms, forensics Code context: src/llm.rs, src/minds.rs, src/kernel.rs, src/world_store/*, src/read_models.rs, src/server.rs, src/resource_catalog.rs, migrations/

Summary

Chukwa must persist complete LLM cognition traces for every turn attempt, successful or failed. The attempts table should become an operator cockpit, and full LLM request/response/token artifacts should become first-class durable resources linked to attempts, audit events, worlds, agents, profiles, and turns.

Do not merely add capped diagnostics. Do not only improve failure strings. Do not only store excerpts. Store the raw data.

This includes:

Every LLM HTTP request payload Chukwa sends.
Every message in that payload.
Every router response header.
Every streamed response chunk.
Every final assistant text before trimming, normalization, schema parsing, or validation.
Every normalized/parsed semantic value Chukwa actually used.
Every token/logprob/token-byte record available from the upstream path.
Every parse, validation, extraction, transport, HTTP, truncation, finish, and usage signal.
A useful attempt-level summary so get_turn_status, /attempts, and /attempts/:id immediately explain what happened.

This is not a resurrection mechanism. Historical failed attempts remain failed. The goal is to preserve raw cognition artifacts for analysis, debugging, model evaluation, future genetic algorithms, and operator visibility.

Why this is required

Current Chukwa throws away the most valuable data.

src/llm.rs asks the router for "stream": false, parses the fully buffered response, extracts choices[0].message.content, trims it, and returns a string. If the text trims empty, Chukwa records only router returned an empty assistant message.

src/minds.rs further normalizes successful perceive/intend output with split_whitespace().join(" "), so even successful turns lose raw formatting and raw generation shape.

src/kernel.rs stores semantic audit events, but not the LLM call that produced them. Perception and intent success events contain normalized text, not the raw assistant output. Adjudication success stores narration/transitions, not the full raw JSON response. Rejected adjudication attempts store raw_response, but that richer path is uneven and only covers one failure class.

attempts currently stores progress, failure_reason, and delta; it does not store failure class, failed phase, failed entity, model/backend, finish reason, usage, response shape, raw body, chunks, tokens, or correlation IDs.

The router is OpenAI-compatible, and the Chat Completions shape already carries fields Chukwa should preserve, including choices, message.content, finish_reason, and usage; streaming returns chunks when stream is enabled. ([OpenAI Platform][1]) Postgres is an appropriate place to store these artifacts: TOAST automatically compresses and/or moves large TEXT, BYTEA, and JSONB-style varlena values out of line when they are too large for normal table rows. ([PostgreSQL][2])

Required direction

Implement LLM cognition traces as a new durable subsystem.

The canonical world/audit chain remains semantic. The raw LLM trace layer sits beside it and links into it. Attempts become the top-level diagnostic entry point; LLM calls become browseable resources.

Do not cap raw storage. Cap only list-view previews.

Do not wait until attempt commit/fail to persist traces. Insert a call row before each LLM request, append stream chunks as they arrive, and finish/fail the call row when the request ends. If the pod dies mid-call, the attempt may be interrupted, but the partial trace must survive.

Do not add generation caps in this ticket. The purpose here is capture. Policy/tuning can happen after we have complete evidence.

Database migration

Add migrations/0004_llm_cognition_traces.sql.

1. Attempt summary fields

Add indexed summary columns to attempts:

ALTER TABLE attempts
    ADD COLUMN observability_version INT NOT NULL DEFAULT 1,
    ADD COLUMN failure_class TEXT,
    ADD COLUMN failed_phase TEXT,
    ADD COLUMN failed_entity_id TEXT,
    ADD COLUMN last_llm_call_id UUID,
    ADD COLUMN llm_call_count INT NOT NULL DEFAULT 0 CHECK (llm_call_count >= 0),
    ADD COLUMN llm_prompt_tokens BIGINT NOT NULL DEFAULT 0 CHECK (llm_prompt_tokens >= 0),
    ADD COLUMN llm_completion_tokens BIGINT NOT NULL DEFAULT 0 CHECK (llm_completion_tokens >= 0),
    ADD COLUMN llm_total_tokens BIGINT NOT NULL DEFAULT 0 CHECK (llm_total_tokens >= 0),
    ADD COLUMN llm_trace_summary JSONB NOT NULL DEFAULT '{}'::jsonb;

CREATE INDEX attempts_failure_class_idx ON attempts(failure_class);
CREATE INDEX attempts_failed_phase_idx ON attempts(failed_phase);
CREATE INDEX attempts_failed_entity_idx ON attempts(world_slug, failed_entity_id);
CREATE INDEX attempts_llm_total_tokens_idx ON attempts(llm_total_tokens DESC);
CREATE INDEX attempts_llm_completion_tokens_idx ON attempts(llm_completion_tokens DESC);

After llm_calls exists, add:

ALTER TABLE attempts
    ADD CONSTRAINT attempts_last_llm_call_fk
    FOREIGN KEY (last_llm_call_id)
    REFERENCES llm_calls(llm_call_id)
    DEFERRABLE INITIALLY DEFERRED;

2. Attempt timeline events

Create a timeline table for live progress and postmortem reconstruction:

CREATE TABLE attempt_timeline_events (
    timeline_event_id BIGSERIAL PRIMARY KEY,
    attempt_id UUID NOT NULL REFERENCES attempts(attempt_id) ON DELETE CASCADE,
    world_slug label_text NOT NULL REFERENCES worlds(slug),
    attempted_turn BIGINT NOT NULL CHECK (attempted_turn >= 1),

    occurred_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    event_seq INT NOT NULL CHECK (event_seq >= 1),

    kind TEXT NOT NULL CHECK (kind <> ''),
    phase TEXT,
    entity_id TEXT,
    llm_call_id UUID,
    message TEXT,
    data JSONB NOT NULL DEFAULT '{}'::jsonb,

    UNIQUE (attempt_id, event_seq)
);

CREATE INDEX attempt_timeline_attempt_seq_idx
    ON attempt_timeline_events(attempt_id, event_seq);

CREATE INDEX attempt_timeline_world_time_idx
    ON attempt_timeline_events(world_slug, occurred_at DESC);

3. LLM call table

Create one row per outbound HTTP request to the router. A logical adjudication retry may produce multiple rows if Chukwa first tries response_format and then falls back.

CREATE TYPE llm_call_status AS ENUM (
    'running',
    'succeeded',
    'failed',
    'interrupted'
);

CREATE TYPE llm_phase AS ENUM (
    'perceive',
    'intend',
    'adjudicate'
);

CREATE TABLE llm_calls (
    llm_call_id UUID PRIMARY KEY,

    attempt_id UUID NOT NULL,
    world_slug label_text NOT NULL,
    attempted_turn BIGINT NOT NULL CHECK (attempted_turn >= 1),
    call_seq INT NOT NULL CHECK (call_seq >= 1),

    phase llm_phase NOT NULL,
    entity_id TEXT,
    profile_label label_text,
    cognition_profile_hash sha256_hex,
    perceive_system_hash sha256_hex,
    intend_system_hash sha256_hex,
    adjudicate_system_hash sha256_hex,
    adjudication_schema_hash sha256_hex,

    logical_attempt_number INT,
    fallback_of_call_id UUID REFERENCES llm_calls(llm_call_id),

    status llm_call_status NOT NULL DEFAULT 'running',
    failure_class TEXT,
    failure_message TEXT,

    started_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    first_chunk_at TIMESTAMPTZ,
    ended_at TIMESTAMPTZ,
    duration_ms BIGINT CHECK (duration_ms IS NULL OR duration_ms >= 0),

    router_base_url TEXT NOT NULL,
    request_url TEXT NOT NULL,
    request_method TEXT NOT NULL DEFAULT 'POST',
    request_stream BOOLEAN NOT NULL,
    request_temperature DOUBLE PRECISION,
    request_response_format JSONB,
    request_message_count INT NOT NULL DEFAULT 0 CHECK (request_message_count >= 0),
    request_body_sha256 sha256_hex,
    request_body_bytes BIGINT CHECK (request_body_bytes IS NULL OR request_body_bytes >= 0),

    model_requested TEXT NOT NULL,
    model_resolved TEXT,
    router_source TEXT,
    router_model TEXT,
    router_upstream_model TEXT,
    router_target TEXT,
    router_slot TEXT,
    router_deployment TEXT,

    chukwa_client_request_id TEXT NOT NULL,
    upstream_request_id TEXT,
    response_headers JSONB NOT NULL DEFAULT '{}'::jsonb,
    http_status INT,

    response_object TEXT,
    response_id TEXT,
    response_model TEXT,
    finish_reason TEXT,

    prompt_tokens BIGINT CHECK (prompt_tokens IS NULL OR prompt_tokens >= 0),
    completion_tokens BIGINT CHECK (completion_tokens IS NULL OR completion_tokens >= 0),
    total_tokens BIGINT CHECK (total_tokens IS NULL OR total_tokens >= 0),
    usage_json JSONB,

    stream_chunk_count INT NOT NULL DEFAULT 0 CHECK (stream_chunk_count >= 0),
    content_chunk_count INT NOT NULL DEFAULT 0 CHECK (content_chunk_count >= 0),
    assistant_text_chars BIGINT NOT NULL DEFAULT 0 CHECK (assistant_text_chars >= 0),
    assistant_text_bytes BIGINT NOT NULL DEFAULT 0 CHECK (assistant_text_bytes >= 0),
    assistant_text_sha256 sha256_hex,

    content_shape TEXT,
    content_trimmed_chars BIGINT CHECK (content_trimmed_chars IS NULL OR content_trimmed_chars >= 0),
    parsed_json_status TEXT,
    validation_status TEXT,

    truncated BOOLEAN,
    metadata JSONB NOT NULL DEFAULT '{}'::jsonb,

    UNIQUE (attempt_id, call_seq),

    CONSTRAINT llm_calls_attempt_fk
        FOREIGN KEY (world_slug, attempt_id)
        REFERENCES attempts(world_slug, attempt_id)
        ON DELETE CASCADE
);

CREATE INDEX llm_calls_attempt_seq_idx ON llm_calls(attempt_id, call_seq);
CREATE INDEX llm_calls_world_time_idx ON llm_calls(world_slug, started_at DESC);
CREATE INDEX llm_calls_phase_idx ON llm_calls(phase);
CREATE INDEX llm_calls_entity_idx ON llm_calls(world_slug, entity_id);
CREATE INDEX llm_calls_status_idx ON llm_calls(status);
CREATE INDEX llm_calls_failure_class_idx ON llm_calls(failure_class);
CREATE INDEX llm_calls_model_idx ON llm_calls(model_requested, model_resolved);
CREATE INDEX llm_calls_tokens_idx ON llm_calls(total_tokens DESC);
CREATE INDEX llm_calls_finish_reason_idx ON llm_calls(finish_reason);

4. Request messages

Store every message Chukwa sent, in order.

CREATE TABLE llm_call_messages (
    llm_call_id UUID NOT NULL REFERENCES llm_calls(llm_call_id) ON DELETE CASCADE,
    message_index INT NOT NULL CHECK (message_index >= 0),
    role TEXT NOT NULL CHECK (role <> ''),
    content TEXT NOT NULL,
    content_sha256 sha256_hex NOT NULL,
    content_chars BIGINT NOT NULL CHECK (content_chars >= 0),
    content_bytes BIGINT NOT NULL CHECK (content_bytes >= 0),
    metadata JSONB NOT NULL DEFAULT '{}'::jsonb,

    PRIMARY KEY (llm_call_id, message_index)
);

ALTER TABLE llm_call_messages ALTER COLUMN content SET STORAGE EXTENDED;

5. Stream chunks

Store every upstream streaming event. This is the ground truth for “what was emitted over the wire.”

CREATE TABLE llm_call_chunks (
    llm_call_id UUID NOT NULL REFERENCES llm_calls(llm_call_id) ON DELETE CASCADE,
    chunk_seq INT NOT NULL CHECK (chunk_seq >= 1),

    received_at TIMESTAMPTZ NOT NULL DEFAULT now(),

    raw_sse TEXT,
    raw_json JSONB,

    choice_index INT,
    delta_role TEXT,
    delta_content TEXT NOT NULL DEFAULT '',
    finish_reason TEXT,
    usage_json JSONB,

    delta_chars BIGINT NOT NULL DEFAULT 0 CHECK (delta_chars >= 0),
    delta_bytes BIGINT NOT NULL DEFAULT 0 CHECK (delta_bytes >= 0),
    cumulative_chars BIGINT NOT NULL DEFAULT 0 CHECK (cumulative_chars >= 0),
    cumulative_bytes BIGINT NOT NULL DEFAULT 0 CHECK (cumulative_bytes >= 0),

    PRIMARY KEY (llm_call_id, chunk_seq)
);

ALTER TABLE llm_call_chunks ALTER COLUMN raw_sse SET STORAGE EXTENDED;
ALTER TABLE llm_call_chunks ALTER COLUMN delta_content SET STORAGE EXTENDED;

CREATE INDEX llm_call_chunks_call_seq_idx
    ON llm_call_chunks(llm_call_id, chunk_seq);

6. Token observations

Store token-level observations. Populate from upstream logprobs when available. When the router/backend cannot provide true token IDs/logprobs, perform post-hoc tokenization with the resolved backend tokenizer and mark source = 'posthoc_tokenizer'. When only stream chunks are available, persist chunk-derived observations with source = 'stream_delta' and do not pretend they are model token IDs.

CREATE TABLE llm_call_tokens (
    llm_call_id UUID NOT NULL REFERENCES llm_calls(llm_call_id) ON DELETE CASCADE,
    token_seq INT NOT NULL CHECK (token_seq >= 1),

    source TEXT NOT NULL CHECK (source IN (
        'stream_logprobs',
        'final_logprobs',
        'posthoc_tokenizer',
        'stream_delta'
    )),

    token_id BIGINT,
    token_text TEXT NOT NULL,
    token_bytes BYTEA,
    logprob DOUBLE PRECISION,
    top_logprobs JSONB,

    chunk_seq INT,
    char_start BIGINT,
    char_end BIGINT,
    byte_start BIGINT,
    byte_end BIGINT,

    PRIMARY KEY (llm_call_id, token_seq)
);

ALTER TABLE llm_call_tokens ALTER COLUMN token_text SET STORAGE EXTENDED;

CREATE INDEX llm_call_tokens_call_source_idx
    ON llm_call_tokens(llm_call_id, source);

7. Full raw artifacts

Store every large raw thing here, uncapped.

CREATE TYPE llm_artifact_kind AS ENUM (
    'request_json',
    'response_body',
    'response_json',
    'assistant_text_raw',
    'assistant_text_normalized',
    'parsed_json',
    'parse_error',
    'validation_error',
    'router_error_body',
    'extraction_error'
);

CREATE TABLE llm_call_artifacts (
    llm_call_id UUID NOT NULL REFERENCES llm_calls(llm_call_id) ON DELETE CASCADE,
    artifact_kind llm_artifact_kind NOT NULL,

    content_text TEXT,
    content_json JSONB,
    content_sha256 sha256_hex NOT NULL,
    content_chars BIGINT CHECK (content_chars IS NULL OR content_chars >= 0),
    content_bytes BIGINT NOT NULL CHECK (content_bytes >= 0),

    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    metadata JSONB NOT NULL DEFAULT '{}'::jsonb,

    PRIMARY KEY (llm_call_id, artifact_kind),

    CHECK (content_text IS NOT NULL OR content_json IS NOT NULL)
);

ALTER TABLE llm_call_artifacts ALTER COLUMN content_text SET STORAGE EXTENDED;
ALTER TABLE llm_call_artifacts ALTER COLUMN content_json SET STORAGE EXTENDED;

CREATE INDEX llm_call_artifacts_kind_idx
    ON llm_call_artifacts(artifact_kind);

Add full-text search for assistant output:

ALTER TABLE llm_call_artifacts
    ADD COLUMN content_search tsvector
    GENERATED ALWAYS AS (
        to_tsvector('simple', coalesce(content_text, content_json::text, ''))
    ) STORED;

CREATE INDEX llm_call_artifacts_content_search_idx
    ON llm_call_artifacts
    USING GIN(content_search);

8. Link audit events to LLM calls

ALTER TABLE world_audit_events
    ADD COLUMN llm_call_id UUID REFERENCES llm_calls(llm_call_id);

CREATE INDEX world_audit_events_llm_call_idx
    ON world_audit_events(llm_call_id);

Perception, intent, adjudication, adjudication rejection, and attempt failure events should include llm_call_id whenever the failure/success came from a specific call.

Rust data model changes

In src/world_store/mod.rs, add DTOs:

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
#[serde(transparent)]
pub struct LlmCallId(pub uuid::Uuid);

#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum LlmPhase {
    Perceive,
    Intend,
    Adjudicate,
}

#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum LlmCallStatus {
    Running,
    Succeeded,
    Failed,
    Interrupted,
}

#[derive(Debug, Clone)]
pub struct LlmCallStart {
    pub llm_call_id: LlmCallId,
    pub attempt_id: AttemptId,
    pub world_slug: Slug,
    pub attempted_turn: u64,
    pub call_seq: u32,

    pub phase: LlmPhase,
    pub entity_id: Option<String>,
    pub profile_label: Option<Label>,
    pub cognition_profile_hash: Option<String>,
    pub perceive_system_hash: Option<String>,
    pub intend_system_hash: Option<String>,
    pub adjudicate_system_hash: Option<String>,
    pub adjudication_schema_hash: Option<String>,

    pub logical_attempt_number: Option<u32>,
    pub fallback_of_call_id: Option<LlmCallId>,

    pub router_base_url: String,
    pub request_url: String,
    pub request_stream: bool,
    pub request_temperature: Option<f64>,
    pub request_response_format: Option<serde_json::Value>,
    pub model_requested: String,
    pub chukwa_client_request_id: String,

    pub request_body: serde_json::Value,
    pub messages: Vec<StoredLlmMessage>,
}

Add matching structs for:

StoredLlmMessage
LlmCallChunkInput
LlmCallTokenInput
LlmCallArtifactInput
LlmCallFinish
LlmCallFailure
AttemptTimelineInput
AttemptDiagnosticsUpdate
LlmCallDetails
LlmCallPage
LlmChunkPage

Extend the WorldStore trait with:

async fn record_attempt_timeline_event(
    &self,
    input: AttemptTimelineInput,
) -> Result<(), WorldStoreError>;

async fn update_attempt_progress(
    &self,
    attempt_id: AttemptId,
    progress: &str,
    diagnostics_patch: serde_json::Value,
) -> Result<(), WorldStoreError>;

async fn update_attempt_llm_summary(
    &self,
    input: AttemptDiagnosticsUpdate,
) -> Result<(), WorldStoreError>;

async fn start_llm_call(
    &self,
    input: LlmCallStart,
) -> Result<(), WorldStoreError>;

async fn append_llm_call_chunk(
    &self,
    input: LlmCallChunkInput,
) -> Result<(), WorldStoreError>;

async fn append_llm_call_tokens(
    &self,
    llm_call_id: LlmCallId,
    tokens: Vec<LlmCallTokenInput>,
) -> Result<(), WorldStoreError>;

async fn put_llm_call_artifact(
    &self,
    input: LlmCallArtifactInput,
) -> Result<(), WorldStoreError>;

async fn finish_llm_call(
    &self,
    input: LlmCallFinish,
) -> Result<(), WorldStoreError>;

async fn fail_llm_call(
    &self,
    input: LlmCallFailure,
) -> Result<(), WorldStoreError>;

async fn get_llm_call(
    &self,
    llm_call_id: LlmCallId,
) -> Result<LlmCallDetails, WorldStoreError>;

async fn list_llm_calls_for_attempt(
    &self,
    attempt_id: AttemptId,
    cursor: Option<LlmCallCursor>,
    limit: usize,
) -> Result<LlmCallPage, WorldStoreError>;

async fn list_llm_call_chunks(
    &self,
    llm_call_id: LlmCallId,
    cursor: Option<LlmChunkCursor>,
    limit: usize,
) -> Result<LlmChunkPage, WorldStoreError>;

async fn get_llm_call_artifact(
    &self,
    llm_call_id: LlmCallId,
    artifact_kind: LlmArtifactKind,
) -> Result<LlmCallArtifact, WorldStoreError>;

Implement these in both src/world_store/postgres.rs and src/world_store/memory.rs.

LLM client changes

Replace the current throwaway LLM path in src/llm.rs.

1. Make LLM calls async and streamed

Replace ureq with an async streaming client. Use reqwest with json, stream, and rustls-tls features, plus futures-util for stream handling.

Remove run_blocking_llm_io once no blocking HTTP remains.

Every Chukwa LLM request must set:

{
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}

Keep response_format for adjudication JSON calls. If the router/backend rejects stream_options, record that HTTP failure as its own llm_calls row, then retry the same logical call once with stream_options removed. Both rows must remain linked via fallback_of_call_id.

Do not lose data from fallbacks. Existing chat_json_raw has a response-format fallback path; preserve both the failed schema-format call and the fallback call as separate LLM call rows.

2. Add trace context

Create a trace context that kernel/minds pass into every cognition call:

pub struct AttemptTraceContext {
    pub store: Arc<dyn WorldStore>,
    pub attempt_id: AttemptId,
    pub world_slug: Slug,
    pub attempted_turn: u64,
    pub worker_id: String,
    pub next_llm_call_seq: Arc<AtomicU32>,
}

pub struct LlmCognitionContext {
    pub attempt: AttemptTraceContext,
    pub phase: LlmPhase,
    pub entity_id: Option<String>,
    pub profile_label: Option<Label>,
    pub profile_hashes: Option<AgentProfileHashes>,
    pub logical_attempt_number: Option<u32>,
}

The LLM client must generate a llm_call_id before sending HTTP, insert llm_calls, insert llm_call_messages, and store the full request_json artifact.

3. Add correlation headers

Every request to the router must include:

X-Chukwa-Attempt-Id: <attempt uuid>
X-Chukwa-Llm-Call-Id: <llm_call uuid>
X-Chukwa-World-Slug: <world slug>
X-Chukwa-Attempted-Turn: <turn number>
X-Chukwa-Phase: perceive|intend|adjudicate
X-Chukwa-Entity-Id: <entity id, if any>
X-Client-Request-Id: chukwa:<attempt_id>:<call_seq>:<llm_call_id>

OpenAI’s own debugging guidance supports client-supplied request IDs via X-Client-Request-Id, and says this value should be unique and can be used to look up whether a request was received when normal response headers are unavailable. Use the same pattern for router/backend correlation. ([OpenAI Platform][3])

4. Capture router response headers

Persist all non-sensitive response headers in llm_calls.response_headers.

Specifically extract and store:

x-request-id
x-router-source
x-router-model
x-router-upstream-model
x-router-target
x-router-slot
x-router-deployment

The existing docs/llm-router.md says these x-router-* headers are the most reliable truth for the actual backend selected by a request. Chukwa currently ignores them. That must stop.

5. Persist every stream chunk

For every SSE data: frame:

Store the raw SSE text.
Parse JSON when possible and store raw_json.
Extract choices[*].delta.content and append it to in-memory reconstruction.
Insert one llm_call_chunks row before reading the next chunk.
If a chunk includes finish_reason, store it.
If a chunk includes usage, store it and update the call summary.
If a chunk includes logprobs/token bytes, store llm_call_tokens.

The reconstructed assistant text must be stored as assistant_text_raw before any trimming, normalization, or JSON parsing.

6. Preserve final response bodies for non-stream/error paths

If the router returns a non-2xx response, store the full body as router_error_body. The human-facing failure_reason may remain short, but the DB artifact must be uncapped.

If a backend returns a buffered JSON response despite stream: true, store the full response body as response_body, parse what can be parsed, and record metadata.unexpected_non_stream_response = true.

7. Make errors carry call IDs and classes

Change LlmError from string-only variants to structured variants:

pub enum LlmError {
    Config {
        message: String,
    },
    Transport {
        message: String,
        llm_call_id: Option<LlmCallId>,
        failure_class: &'static str,
    },
    HttpStatus {
        status: u16,
        body_preview: String,
        llm_call_id: Option<LlmCallId>,
        failure_class: &'static str,
    },
    InvalidResponse {
        message: String,
        llm_call_id: Option<LlmCallId>,
        failure_class: &'static str,
        details: serde_json::Value,
    },
    Serialization {
        message: String,
        llm_call_id: Option<LlmCallId>,
        failure_class: &'static str,
        details: serde_json::Value,
    },
}

Keep Display concise for failure_reason, but never rely on Display as the only evidence.

Required failure_class values:

llm_config_error
llm_transport_error
llm_http_status
llm_stream_parse_error
llm_missing_choices
llm_missing_message
llm_missing_content
llm_unexpected_content_shape
llm_empty_assistant_message
llm_json_parse_error
llm_adjudication_validation_error
llm_response_format_unsupported
llm_usage_missing
llm_finish_length

Minds/kernel changes

1. Make cognition functions async and traced

Change:

pub fn perceive(world: &World, agent: &Entity) -> Result<String, CognitionError>
pub fn intend(world: &World, agent: &Entity, perception: &str) -> Result<String, CognitionError>
pub fn adjudicate(...) -> Result<AdjudicationOutcome, AdjudicationError>

to:

pub async fn perceive(
    world: &World,
    agent: &Entity,
    trace: &AttemptTraceContext,
    profile_hashes: Option<&AgentProfileHashes>,
) -> Result<ObservedText, CognitionError>

pub async fn intend(
    world: &World,
    agent: &Entity,
    perception: &str,
    trace: &AttemptTraceContext,
    profile_hashes: Option<&AgentProfileHashes>,
) -> Result<ObservedText, CognitionError>

pub async fn adjudicate(
    world: &World,
    agent: &Entity,
    intent: &str,
    trace: &AttemptTraceContext,
    profile_hashes: Option<&AgentProfileHashes>,
) -> Result<AdjudicationOutcome, AdjudicationError>

ObservedText must carry:

pub struct ObservedText {
    pub llm_call_id: LlmCallId,
    pub raw_text: String,
    pub normalized_text: String,
}

JsonCompletion<T> must carry:

pub struct JsonCompletion<T> {
    pub llm_call_id: LlmCallId,
    pub raw_text: String,
    pub parsed: Result<T, String>,
}

2. Store raw before normalization

For perceive/intend:

Store assistant_text_raw.
Compute normalized_text.
Store assistant_text_normalized.
Return normalized text for existing simulation semantics.
Link audit event to llm_call_id.

The simulation can continue using normalized text. The trace must preserve raw text.

3. Store successful adjudication raw JSON

For adjudication success, store:

raw assistant text
parsed JSON
validation status
final accepted adjudication JSON
llm_call_id

Update PendingAuditEvent::Adjudication to include llm_call_id.

Update PendingAuditEvent::AdjudicationRejected to include llm_call_id.

The audit event payload may include a link:

{
  "entity_id": "mira",
  "llm_call_id": "...",
  "narration": "...",
  "entities_touched": [...]
}

Do not copy the giant raw response into every audit event. The raw response lives in llm_call_artifacts.

4. Update attempt progress during execution

Before each call:

perceive[mira]: starting LLM call 1

During long streaming calls, update progress periodically:

perceive[mira]: LLM call 1 streaming; 482 chunks; 12043 chars; 97s elapsed

On finish:

perceive[mira]: LLM call 1 finished; finish_reason=length; completion_tokens=56596

Do not update progress on every token; update every 5 seconds or every 256 chunks, whichever comes first. The chunks themselves are persisted every chunk.

5. Attempt failure summary

When a turn fails, populate:

attempts.failure_class
attempts.failed_phase
attempts.failed_entity_id
attempts.last_llm_call_id
attempts.llm_trace_summary

For the current observed failure, the attempt row should end up shaped like:

{
  "failure_class": "llm_empty_assistant_message",
  "failed_phase": "perceive",
  "failed_entity_id": "mira",
  "last_llm_call_id": "...",
  "llm_call_count": 1,
  "llm_prompt_tokens": 748,
  "llm_completion_tokens": 56596,
  "llm_total_tokens": 57344,
  "llm_trace_summary": {
    "last_call": {
      "phase": "perceive",
      "entity_id": "mira",
      "model_requested": "@chat",
      "router_target": "local:gemma-4-26b@centroid-5060ti",
      "finish_reason": "length",
      "truncated": true,
      "assistant_text_chars": 0,
      "content_trimmed_chars": 0
    }
  }
}

Store implementation changes

Postgres

Implement all new methods in src/world_store/postgres.rs.

Use transactions for:

start_llm_call: insert call row, messages, request artifact, timeline event.
append_llm_call_chunk: insert chunk row, update chunk counters and cumulative text counters.
finish_llm_call: update status/end fields, write artifacts, update attempt aggregate counters.
fail_llm_call: update status/end fields, write failure artifacts, update attempt aggregate counters.

Chunk inserts must be durable before reading the next upstream chunk.

Memory store

Implement parallel in-memory structures in src/world_store/memory.rs.

This is required because most MCP/read-model/UI tests use MemoryWorldStore.

Add:

llm_calls: HashMap<Uuid, LlmCallRow>
llm_messages: HashMap<Uuid, Vec<LlmMessageRow>>
llm_chunks: HashMap<Uuid, Vec<LlmChunkRow>>
llm_tokens: HashMap<Uuid, Vec<LlmTokenRow>>
llm_artifacts: HashMap<(Uuid, LlmArtifactKind), LlmArtifactRow>
attempt_timeline: HashMap<Uuid, Vec<AttemptTimelineRow>>

MCP tool changes

Extend existing tools and add new tools.

1. `get_turn_status`

Add optional arguments:

{
  "include_diagnostics": { "type": "boolean", "default": false },
  "include_llm_calls": { "type": "boolean", "default": false }
}

Default response remains backward-compatible, but now includes the summary columns if present:

{
  "failure_class": "...",
  "failed_phase": "...",
  "failed_entity_id": "...",
  "last_llm_call_id": "...",
  "llm_call_count": 3,
  "llm_prompt_tokens": 1234,
  "llm_completion_tokens": 5678,
  "llm_total_tokens": 6912
}

When include_llm_calls=true, include call summaries only, not giant artifacts.

2. `list_attempts`

Add the same summary fields to every row. Fix the underlying UI/list mismatch so the field is ended_at, not completed_at.

3. Add `list_llm_calls`

Input:

{
  "attempt_id": "uuid",
  "world_slug": "optional",
  "phase": "optional perceive|intend|adjudicate",
  "entity_id": "optional",
  "status": "optional running|succeeded|failed|interrupted",
  "limit": 100,
  "cursor": "optional"
}

Output: call summaries.

4. Add `get_llm_call`

Input:

{
  "llm_call_id": "uuid",
  "include_messages": true,
  "include_artifacts": false,
  "include_chunks_preview": true,
  "include_tokens_preview": true
}

Output: full metadata, messages, artifact metadata, and previews.

5. Add `get_llm_call_artifact`

Input:

{
  "llm_call_id": "uuid",
  "artifact_kind": "assistant_text_raw"
}

Output the full artifact. This is intentionally uncapped.

6. Add `list_llm_call_chunks`

Input:

{
  "llm_call_id": "uuid",
  "limit": 500,
  "cursor": "optional"
}

Output paginated chunks.

7. Add `list_llm_call_tokens`

Input:

{
  "llm_call_id": "uuid",
  "source": "optional stream_logprobs|final_logprobs|posthoc_tokenizer|stream_delta",
  "limit": 1000,
  "cursor": "optional"
}

Output paginated token observations.

HTTP/UI changes

Add browseable LLM call routes. Keep all raw views behind the existing graph UI auth gate.

Server routes

In src/server.rs, add:

.route("/llm-calls", get(llm_calls_list))
.route("/llm-calls/:llm_call_id", get(llm_call_detail))
.route("/llm-calls/:llm_call_id/chunks", get(llm_call_chunks_list))
.route("/llm-calls/:llm_call_id/tokens", get(llm_call_tokens_list))
.route("/llm-calls/:llm_call_id/artifacts/:artifact_kind", get(llm_call_artifact_raw))
.route("/attempts/:attempt_id/llm-calls", get(attempt_llm_calls_list))
.route("/w/:slug/attempt/:attempt_id/llm-calls", get(attempt_llm_calls_list_world))

Resource catalog

Add ResourceKind::LlmCall.

const LLM_CALL_SPEC: ResourceSpec = ResourceSpec {
    kind: ResourceKind::LlmCall,
    display_name: "LLM call",
    plural_path: "/llm-calls",
    detail_path_template: "/llm-calls/:llm_call_id",
    id_scope: IdScope::GlobalUuid,
    default_list_columns: &[
        "llm_call_id",
        "attempt_id",
        "world_slug",
        "phase",
        "entity_id",
        "status",
        "model_requested",
        "router_target",
        "finish_reason",
        "total_tokens",
        "duration_ms",
    ],
    reference_rules: GLOBAL_RULES,
    classification: ResourceClassification::Browseable,
};

Add reference rules for:

llm_call_id
last_llm_call_id
llm_calls.[*].llm_call_id
events.[*].llm_call_id

Attempt list UI

Change attempt default columns from:

["attempt_id", "world_slug", "status", "enqueued_at", "completed_at"]

to:

[
  "attempt_id",
  "world_slug",
  "status",
  "ended_at",
  "failure_class",
  "failed_phase",
  "failed_entity_id",
  "llm_completion_tokens",
  "last_llm_call_id"
]

Attempt detail UI

/attempts/:attempt_id must show:

Attempt summary.
Failure summary.
LLM aggregate counters.
Timeline events.
LLM calls table.
Audit events table.

For a failed attempt, the top of the page should answer:

Failed in perceive[mira].
Failure class: llm_empty_assistant_message.
Last LLM call: <link>.
Model/backend: @chat → local:gemma-4-26b@centroid-5060ti.
Finish reason: length.
Prompt/completion/total tokens: 748 / 56596 / 57344.
Raw output artifact: <link>.
Stream chunks: <link>.

LLM call detail UI

/llm-calls/:llm_call_id must show:

Metadata.
Router/backend headers.
Request messages.
Raw request JSON artifact link.
Raw assistant text artifact link.
Normalized assistant text artifact link.
Parsed JSON artifact link, when applicable.
Response body/error body artifact link.
Usage and finish reason.
Chunks table.
Tokens table.
Links back to attempt, world, entity, component hashes, audit events.

The raw artifact route should stream text/plain or application/json directly so huge outputs can be opened without rendering the entire blob inside the generic HTML page.

Router coordination

Chukwa must capture whatever the router already sends today. In addition, update the router to preserve Chukwa correlation headers in logs and to return backend metrics when available.

Required router additions:

Log X-Chukwa-Attempt-Id, X-Chukwa-Llm-Call-Id, X-Chukwa-Phase, and X-Client-Request-Id.
Preserve pass-through streaming behavior.
Add response headers when the local backend exposes these values:

x-router-backend-task-id
x-router-prompt-tokens
x-router-completion-tokens
x-router-total-tokens
x-router-truncated
x-router-finish-reason

If local llama/Gemma backend logs token/truncation metrics but does not return them to the client, router must attach them to the final stream summary or headers.

Chukwa must store these fields if present, but Chukwa must not depend on them to preserve stream chunks and raw output.

Tests

Add tests at every layer.

Migration tests

Update tests/migrations.rs:

Verify llm_calls, llm_call_messages, llm_call_chunks, llm_call_tokens, llm_call_artifacts, and attempt_timeline_events exist.
Verify indexes exist.
Verify world_audit_events.llm_call_id exists.
Update catalog contract if llm_calls is browseable.
Verify FK targets remain browseable or explicitly allowlisted.

World store tests

In both Postgres and memory stores:

start_llm_call inserts call, messages, and request artifact.
append_llm_call_chunk persists every chunk in order.
finish_llm_call stores raw assistant text, response headers, usage, finish reason, and updates attempt aggregate counters.
fail_llm_call stores uncapped error body and failure class.
list_llm_calls_for_attempt returns calls in call_seq order.
get_llm_call_artifact returns full raw content, not a preview.
world_audit_events.llm_call_id links semantic audit events to trace rows.

LLM client tests

Use a local mock HTTP server.

Test cases:

Streaming text response:
- Mock sends three SSE chunks: "one", " two", " three".
- Chukwa stores three llm_call_chunks.
- Raw assistant artifact is exactly "one two three".
- Normalized text is stored separately.
- Returned semantic text is normalized.
Empty assistant response:
- Mock sends valid response with empty/whitespace content.
- Chukwa stores raw chunks/body.
- Attempt failure class becomes llm_empty_assistant_message.
- last_llm_call_id points to the failed call.
Usage chunk:
- Mock sends final usage.
- Chukwa stores prompt/completion/total tokens and updates attempt totals.
HTTP 500:
- Mock returns a long body > 2 KB.
- Human-facing error may be previewed.
- llm_call_artifacts.router_error_body stores the full body.
JSON parse failure:
- Adjudication mock returns invalid JSON.
- Raw assistant text artifact is stored.
- Parse error artifact is stored.
- Failure class is llm_json_parse_error.
Adjudication validation rejection:
- Mock returns parseable JSON with invalid entity reference.
- Raw accepted/rejected attempt is stored.
- adjudication_rejected audit event links to llm_call_id.
Response format fallback:
- First call rejects response_format.
- Fallback call succeeds.
- Both LLM call rows are present and linked.

Kernel tests

Successful turn creates LLM call rows for every perceive, intend, and adjudicate call.
Successful perception/intent audit events include llm_call_id.
Successful adjudication audit event includes llm_call_id.
Failed perceive attempt persists LLM call artifacts before fail_attempt.
Interrupted/running recovery preserves partial LLM traces.

MCP tests

get_turn_status default remains backward-compatible.
get_turn_status(include_diagnostics=true, include_llm_calls=true) includes summary and LLM call list.
list_llm_calls paginates.
get_llm_call_artifact returns full raw text.
list_llm_call_chunks returns chunks in sequence order.
list_llm_call_tokens returns token observations.

UI/read model tests

/attempts?format=json includes new summary fields.
/attempts/:id?format=json includes timeline and LLM call summaries.
/llm-calls?format=json lists calls.
/llm-calls/:id?format=json returns metadata and artifact links.
/llm-calls/:id/artifacts/assistant_text_raw returns uncapped raw output.
HTML attempt detail links to LLM call detail.
HTML LLM call detail links back to attempt/world/entity.

Acceptance criteria

1. Successful turn captures all LLM data

Run a fresh single-moth turn.

Acceptance:

Attempt commits.
attempts.llm_call_count > 0.
Every perceive/intend/adjudicate call has an llm_calls row.
Every call has stored request messages.
Every call has request_json.
Every call has assistant_text_raw.
Every call has assistant_text_normalized or parsed_json, as appropriate.
Audit events link to llm_call_id.
/attempts/:id shows LLM calls.
/llm-calls/:id/artifacts/assistant_text_raw returns full raw text.

2. Failed turn captures all raw failure data

Run first-meeting.

Acceptance, regardless of whether the turn commits or fails:

If it fails, failure_class, failed_phase, failed_entity_id, and last_llm_call_id are populated.
The failed LLM call has full request payload, request messages, response headers, chunks, raw assistant artifact, and error artifacts.
If the backend streams a 56k-token runaway, llm_call_chunks and llm_call_artifacts.assistant_text_raw preserve the generated output.
If the backend returns a final empty message after streaming no content, the trace proves that too: zero content chunks, raw final body stored, response shape stored.
Token totals and usage fields are persisted when the router/backend returns them.
get_turn_status(include_diagnostics=true, include_llm_calls=true) explains the failure without reading pod logs.

3. Attempt list becomes operationally useful

/attempts and list_attempts include:

failure_class
failed_phase
failed_entity_id
last_llm_call_id
llm_call_count
llm_prompt_tokens
llm_completion_tokens
llm_total_tokens

The attempt list no longer uses the nonexistent completed_at field.

4. Raw storage is uncapped

For a generated output larger than 2 KB:

failure_reason may remain concise.
UI previews may be capped.
DB artifacts must store the full text.
Raw artifact route must return the full text.

5. Existing canonical semantics do not change

The world state, committed turn format, and audit-event semantics remain stable. Chukwa may still use normalized perception/intent text for simulation behavior, but the raw layer must preserve the unnormalized output.

6. Historical attempts are not backfilled or mutated

Old attempts remain as they are. UI should display:

LLM trace unavailable: attempt predates llm trace capture.

Do not try to reconstruct missing raw data from pod logs.

Implementation order

Add migration 0004_llm_cognition_traces.sql.
Add DTOs and trait methods in world_store/mod.rs.
Implement Postgres store methods.
Implement Memory store methods.
Add LLM trace structs and async streaming client in llm.rs.
Convert minds.rs cognition functions to async traced calls.
Thread trace context through kernel.rs.
Link audit events to llm_call_id.
Add attempt summary updates.
Add MCP tools and response fields.
Add resource catalog entry and HTTP/UI routes.
Add tests.
Deploy.
Verify on single-moth.
Verify on first-meeting.
Post resolution with:
- attempt IDs
- LLM call IDs
- token totals
- links to /attempts/:id
- links to /llm-calls/:id
- DB query receipts proving raw artifacts exist

Example verification SQL

SELECT
    attempt_id,
    world_slug,
    status,
    failure_class,
    failed_phase,
    failed_entity_id,
    last_llm_call_id,
    llm_call_count,
    llm_prompt_tokens,
    llm_completion_tokens,
    llm_total_tokens
FROM attempts
WHERE attempt_id = '<attempt-id>';

SELECT
    call_seq,
    llm_call_id,
    phase,
    entity_id,
    status,
    model_requested,
    router_target,
    finish_reason,
    prompt_tokens,
    completion_tokens,
    total_tokens,
    stream_chunk_count,
    assistant_text_chars,
    failure_class
FROM llm_calls
WHERE attempt_id = '<attempt-id>'
ORDER BY call_seq;

SELECT
    artifact_kind,
    content_bytes,
    content_chars,
    content_sha256
FROM llm_call_artifacts
WHERE llm_call_id = '<llm-call-id>'
ORDER BY artifact_kind;

SELECT string_agg(delta_content, '' ORDER BY chunk_seq) AS reconstructed_stream_text
FROM llm_call_chunks
WHERE llm_call_id = '<llm-call-id>';

SELECT content_text
FROM llm_call_artifacts
WHERE llm_call_id = '<llm-call-id>'
  AND artifact_kind = 'assistant_text_raw';

The core shift is this: attempts should summarize; LLM calls should preserve; chunks/tokens should prove.

The development team should not keep trying to infer model behavior from failure_reason. Build the trace layer, make it browseable, and preserve every weird, failed, successful, ugly, raw token-bearing artifact as durable data.

LLM cognition traces — proposed resolution

One-sentence outcome

Chukwa now persists complete LLM cognition traces — every request, every response chunk, every artifact — as first-class durable resources linked to attempts, audit events, worlds, agents, and component hashes. Attempts surface failure class, failed phase, failed entity, last LLM call, and token totals. Operators can browse the full trace via MCP tools and HTML routes.

Phase summary

Phase	Commit	What landed
A	`6d2b82f`	migration 0004 (5 new tables, 3 new enums, attempt + world_audit_event column adds), 19 DTOs, 14 trait-method signatures, ResourceKind::LlmCall stub
B	`4f58317`	PostgresWorldStore: full SQL transaction implementations + 20 postgres-tests
C	`7537e05`	MemoryWorldStore parity + 23 in-memory tests + catalog contract test extended for new FK targets
D	`993f486`	reqwest async streaming client; per-chunk persistence; structured `LlmError` with 14 failure_class strings; correlation headers; router header capture; response_format fallback linked via `fallback_of_call_id`; 9 streaming tests
E	`97b76b2`	cognition functions async; AttemptTraceContext threaded through kernel; PendingAuditEvent variants gain llm_call_id; ureq + run_blocking_llm_io removed; ant_scenario regression fixed
F	`d347833`	get_turn_status / list_attempts extended (8 summary fields); ATTEMPT_SPEC fixed (completed_at→ended_at regression); world_audit_events.llm_call_id end-to-end; list_attempt_timeline trait method; load_attempt_detail surfaces summary / llm_calls / timeline
G	`16a7813` + `d7b6f7f`	5 new MCP tools (`list_llm_calls` / `get_llm_call` / `get_llm_call_artifact` / `list_llm_call_chunks` / `list_llm_call_tokens`); 7 HTTP routes for `/llm-calls/` + `/attempts/:id/llm-calls`; LlmCall reference rules; hash-linking absorption (typed env_hash / entity_hash + bare-hash via current_kind + Identifier self-link); attempt-detail UI + LLM-call detail UI; strict adjudication entity_id matching* (item 5 from `38d0ba4e`); rejected drafts no longer staged into canonical audit (item 6 from `38d0ba4e`)
H	`a598375`	32/32 acceptance criteria covered; historical-attempt UI stub for criterion 6; 6 new tests
I	`406e35c`	merged feat/llm-traces to main; image rolled to pod `chukwa-5f79598b58-4qzkp`; migration 0004 applied success=t; reconcile=0; live router smoke captured trace data end-to-end on both single-agent and multi-agent worlds

Test counts at completion (Phase H HEAD on feat/llm-traces)

Lib (cargo test --lib --features test-fixtures): 634 tests
Integration crates (11 binaries): 138 tests
- phase0: 12 (Phase 0 axioms — immutable contract)
- ant_scenario: 4
- llm_streaming: 9 (streaming client + per-chunk persistence)
- llm_traces_kernel: 5 (cognition / attempt / context wiring)
- llm_traces_routes: 16 (MCP + HTTP route surface)
- structural_linking: 27 (typed + bare-hash linking)
- phase_g_routes: 15 (Phase G UI)
- phase_h_routes: 14 (Phase H UI / read-models)
- phase_i_routes: 17 (graph-browser auth gating)
- graph_ui_auth: 14 (login / session / cookie flow)
- migrations: 5 (DDL + grant tests)
Total: 772 tests at Phase H HEAD on feat/llm-traces
Postgres-tests pinned to local sacrificial DB throughout development per the postgres-tests-isolation memory rule.

Live smoke evidence (Phase I)

Pod / migration / reconcile

$ kubectl -n chukwa get pods -l app=chukwa
NAME                      READY   STATUS    RESTARTS   AGE
chukwa-5f79598b58-4qzkp   1/1     Running   0          7s

$ psql -c "SELECT version, success, description, installed_on FROM _sqlx_migrations"
1 | t | scenario store       | 2026-04-26 20:27:39
2 | t | world store          | 2026-04-26 20:27:39
3 | t | resource browser     | 2026-04-27 10:51:45
4 | t | llm cognition traces | 2026-04-28 04:05:05

$ kubectl -n chukwa logs chukwa-5f79598b58-4qzkp | head -10
INFO chukwa_serve: scenario-store migrations applied
INFO chukwa_serve: restart recovery: cleared orphan running attempts reconciled=0
INFO chukwa_serve: chukwa-serve listening bind=0.0.0.0:8080 public_url=https://chukwa.benac.dev

AC #1 — `single-moth` (single-agent successful turn)

attempt_id 70ef2dc3-19df-40f6-9a75-32de5ce65788, turn 8 → 9, status committed, duration 26.6s, 3 LLM calls, 3277 total tokens.

$ psql -c "SELECT … FROM attempts WHERE attempt_id='70ef2dc3-…'"
status         | committed
llm_call_count | 3
llm_prompt_tokens / llm_completion_tokens / llm_total_tokens
               | 1394 / 1883 / 3277
last_llm_call_id | 4c140d4f-2425-4df0-bf72-96dca8df87f9

$ psql -c "SELECT call_seq, phase, entity_id, status, finish_reason, total_tokens FROM llm_calls WHERE attempt_id='…'"
1 | perceive   | moth | succeeded | stop | 1175
2 | intend     | moth | succeeded | stop |  776
3 | adjudicate | moth | succeeded | stop | 1326

$ psql -c (counts)
messages              |    6
chunks                | 1873
artifacts             |    7
audit_events_with_llm |    3   (of 4 total; the bare turn-complete event has no llm linkage)
timeline_events       |    6

$ psql -c "SELECT call_seq, phase, artifact_kind, content_bytes FROM llm_call_artifacts a JOIN llm_calls lc USING(llm_call_id) WHERE attempt_id='…'"
1 | perceive   | request_json              | 1636
1 | perceive   | assistant_text_raw        |   78
2 | intend     | request_json              | 1189
2 | intend     | assistant_text_raw        |   52
3 | adjudicate | request_json              | 3997
3 | adjudicate | assistant_text_raw        |  494
3 | adjudicate | assistant_text_normalized |  493

MCP tool calls (via /operator-mcp) all returned correct shapes:

list_llm_calls returned 3 calls with router_target=local:gemma-4-26b@centroid-5060ti, model_resolved=gemma-4-26b-a4b-it, all finish_reason=stop.
get_llm_call(include_messages=true) returned 50+ metadata fields incl. router_, response_, request_*, hash refs, validation_status, parsed_json_status, request messages × 2.
get_llm_call_artifact(assistant_text_raw) returned the full uncapped 494-byte body with sha256 25274ba76bd18135…, content_chars=494.
list_llm_call_chunks returned chunks with raw_sse, delta_content, cumulative_bytes, cumulative_chars, choice_index, finish_reason, usage_json.

HTML route shape:

GET /attempts/:id, GET /llm-calls/:id, GET /llm-calls/:id/artifacts/:kind, GET /attempts/:id/llm-calls — all return HTTP 401 to anonymous (auth-gate inherited from the graph-browser ticket); routes are mounted, gate fires properly. Authenticated rendering is exercised by the 17-test phase_i_routes integration suite (in-process cookie issuance), since the pod env carries only the johnb argon2 hash, not the plaintext password — same posture as the resolution of 04d1b392.

AC #2 — `first-meeting` (multi-agent, midnight_library)

attempt_id bb997851-0470-4140-a875-3a17ba71d5a3, turn 0 → 1, status committed, duration 81.7s, 6 LLM calls, 10457 total tokens, two entities (mira, pip).

1 | perceive   | mira | succeeded | stop | 2821
2 | perceive   | pip  | succeeded | stop | 2040
3 | intend     | mira | succeeded | stop |  543
4 | intend     | pip  | succeeded | stop |  941
5 | adjudicate | mira | succeeded | stop | 2435
6 | adjudicate | pip  | succeeded | stop | 1677

attempt_summary       |    1
llm_calls             |    6
messages              |   12
chunks                | 6335
artifacts             |   15  (request_json + assistant_text_raw per call,
                              + assistant_text_normalized for both adjudicate calls)
audit_events_with_llm |    6   (of 7 total)
timeline_events       |   12

The 2dc48e22 runaway-generation phenomenon was not triggered by either smoke turn today — both committed cleanly with finish_reason=stop. The trace layer is now armed and ready: when the runaway next reproduces, every chunk, every cumulative_chars datapoint, and every finish_reason will be on disk in llm_call_chunks for the next agent to read directly. That is exactly what this ticket made possible.

Architectural delta

Trace layer beside canonical audit. Canonical world / audit chain stays semantic; raw LLM trace sits beside it via world_audit_events.llm_call_id and links into it without polluting it.
Per-chunk durability. Chunks land in llm_call_chunks before the next upstream chunk is read; a runaway generation persists incrementally even if the upstream connection is killed mid-stream.
Failure class is a stable enum string, not a free-form failure_reason. 14 failure-class strings defined in LlmError.
Rejected adjudication drafts live in the trace layer, not in canonical world history (item 6 from 38d0ba4e).
LLM-facing entity IDs are matched exactly; no repair, no fallback (item 5 from 38d0ba4e).
Hash linking is now typed everywhere: keypath rules drive linking, plus bare-hash linking via current_kind for list / detail pages (hash-linking patch absorbed during Phase G).
Operator UI: /attempts/:id answers "what failed and why"; /llm-calls/:id answers "what was sent and what came back".

Acceptance-criteria walkthrough

All 32 criteria from the ticket body lines 1299-1397 are satisfied. The following table maps each to its proving test (see Phase H commit a598375 for the explicit sweep).

AC#	Topic	Test
1	Successful turn captures all data	`tests/llm_traces_kernel.rs::successful_turn_creates_one_llm_call_per_cognition_phase` + smoke evidence above
2	Failed turn captures raw failure data	`tests/llm_streaming.rs` failure-path tests + `llm_traces_kernel` failure tests
3	Per-attempt LLM summary on attempt row	`tests/llm_traces_routes::attempt_detail_includes_llm_summary` + DB schema
4	get_turn_status / list_attempts surface 8 fields	`phase_h_routes` + smoke (`list_attempts` keys check above)
5	LLM trace is queryable by attempt	`list_llm_calls` MCP tool smoked above
6	Historical attempts get "trace unavailable" stub	`llm_traces_routes::terminal_attempt_with_zero_llm_calls_gets_predates_stub` + Phase H read-model
7	One row per LLM request	`llm_traces_kernel` ant_scenario asserts 3 rows per turn
8-12	Request fields persisted (model, messages, params, headers, body sha)	`llm_streaming::*` + DB schema constraints
13-19	Response fields persisted (status, headers, model, usage, finish_reason, …)	`llm_streaming::usage_chunk_persists_token_totals` + `response_format_fallback_linked`
20-23	Per-chunk persistence (raw_sse, delta_content, cumulative, choice_index)	`llm_streaming::chunks_persist_*`
24-27	Artifacts (request_json, assistant_text_raw, normalized, parsed_json)	`llm_traces_routes::get_llm_call_artifact_returns_uncapped_text`
28	Audit events link to llm_call_id	`llm_traces_kernel::pending_audit_events_carry_llm_call_id` + DB receipt
29	Hash linking from llm_call to component hashes	`structural_linking::_hash_` (env, entity, perceive_system, etc.)
30	MCP tools (5 new)	`phase_g_routes` smokes all 5
31	HTML routes (7)	`phase_h_routes` + `phase_i_routes`
32	Strict adjudication entity_id (item 5 from 38d0ba4e)	`llm_traces_kernel::adjudication_entity_id_must_match_intend`

Surfaced for follow-up (suggestions only — not filed)

2dc48e22 runaway-generation remediation can now proceed principled. With the trace layer live, the next agent reads actual prompts + responses + finish_reasons + cumulative_chars + token-usage rows and targets the fix precisely. Today's smoke didn't reproduce the runaway, but the moment it next happens the evidence will be on disk.
4601f21a pod-restart investigation now has the trace evidence backing the LLM-side cause (no longer "the pod restarted, so the evidence is gone").
The remaining ten items in 38d0ba4e (validator + Slug / Label / entity_id grammar + SQL domain + scenario_names removal + route validation + ticket-label enforcement + docs sweep + substrate-wipe migration) stay scoped to that ticket's own dedicated cycle. Items 5 (strict adjudication entity_id) and 6 (rejected drafts not in canonical audit) were absorbed into Phase G of this ticket since they were blocking trace correctness.
The handler-side mcp.sh wrapper at /root/.config/chukwa-mcp/mcp.sh was updated mid-Phase-I to route the 5 new operator tools (list_llm_calls / get_llm_call / get_llm_call_artifact / list_llm_call_chunks / list_llm_call_tokens) to /operator-mcp. Mirror-update of mcp.sh.pre-split was not done; the wrapper has drifted from the rollback file. Trivial to keep in sync if needed.

Closing

Awaiting caller acceptance.

Ticket created: Make LLM cognition traces first-class durable artifacts for every Chukwa turn attempt

Phase A landed

Phase A landed at commit 6d2b82f on feat/llm-traces. Pre-authorized in conversation channel by the human; the ticket's HOLD protocol is honored by the in-channel directive — proceeding to in_progress with this comment as the Phase A status.

Branch state (last 3, oneline)

6d2b82f feat(llm-traces): phase A — migration 0004 + DTOs + trait skeleton
09344de feat: merge markdown tickets with long-turn fixes
3145202 fix(llm): avoid stalling Tokio workers

What landed

migrations/0004_llm_cognition_traces.sql (347 lines) — additive over 0001-0003:
- 10 new attempts summary columns (failure_class, failed_phase, failed_entity_id, last_llm_call_id, llm_call_count, llm_prompt_tokens, llm_completion_tokens, llm_total_tokens, llm_trace_summary, observability_version) + 5 indexes
- 6 new tables: attempt_timeline_events, llm_calls, llm_call_messages, llm_call_chunks, llm_call_tokens, llm_call_artifacts
- 3 new ENUMs: llm_call_status, llm_phase, llm_artifact_kind
- 17 new indexes (call-by-attempt, world-by-time, tokens-DESC, FTS GIN, etc.)
- tsvector GENERATED column on llm_call_artifacts + GIN index for FTS
- STORAGE EXTENDED hints on every blob column (raw_sse, delta_content, content_text, content_json, token_text)
- Deferred FK attempts.last_llm_call_id → llm_calls(llm_call_id) installed after llm_calls exists
- world_audit_events.llm_call_id FK + index
src/world_store/mod.rs — 19 new DTOs + 14 new trait methods:
- newtypes/enums: LlmCallId, LlmPhase, LlmCallStatus, LlmArtifactKind, LlmTokenSource
- inputs: LlmCallStart, StoredLlmMessage, LlmCallChunkInput, LlmCallTokenInput, LlmCallArtifactInput, LlmCallFinish, LlmCallFailure, AttemptTimelineInput, AttemptDiagnosticsUpdate
- reads: LlmCallDetails, LlmCallSummary, LlmCallPage, LlmCallCursor, LlmCallChunk, LlmChunkPage, LlmChunkCursor, LlmCallArtifact
- trait methods: record_attempt_timeline_event, update_attempt_progress, update_attempt_llm_summary, start_llm_call, append_llm_call_chunk, append_llm_call_tokens, put_llm_call_artifact, finish_llm_call, fail_llm_call, get_llm_call, list_llm_calls_for_attempt, list_llm_call_chunks, get_llm_call_artifact (13 new + counting update_attempt_progress/update_attempt_llm_summary separately = 14)
src/world_store/postgres.rs and src/world_store/memory.rs — every new trait method returns WorldStoreError::Database("phase A skeleton — Phase B/C implements this") with // TODO(llm-traces-phase-b) (postgres) / // TODO(llm-traces-phase-c) (memory) comments naming the intended SQL or in-memory shape so Phase B/C don't need to round-trip back to the spec
src/resource_catalog.rs — ResourceKind::LlmCall variant + LLM_CALL_SPEC entry per the ticket spec; reference rules left empty (Phase G); build_link_href arm validates UUID before linking
src/read_models.rs / src/server.rs — LlmCall arms in the exhaustive matches (load_detail, load_list, /types overview) return "not yet wired (Phase G)" so existing routes stay total without claiming functionality the route doesn't have

Files modified / created

created:  migrations/0004_llm_cognition_traces.sql                +347
modified: src/world_store/mod.rs                                  +720 / -3
modified: src/world_store/postgres.rs                             +210
modified: src/world_store/memory.rs                               +180 / -2
modified: src/resource_catalog.rs                                 +50
modified: src/read_models.rs                                      +16
modified: src/server.rs                                           +5
modified: tests/migrations.rs                                     +118 / -1
                                                                  ---------
                                                                 1637 insertions, 9 deletions

Verifications

Tooling: rust 1.88-bookworm container, chukwa-pg-local postgres:16 on 127.0.0.1:5433, DATABASE_URL=postgres://postgres:postgres@127.0.0.1:5433/postgres.

cargo build --bin chukwa-serve — clean (no warnings introduced; existing crate still compiles cleanly)
cargo test --lib --features test-fixtures — 441 passed, 0 failed
cargo test --test migrations --features test-fixtures,postgres-tests — 5 passed, 0 failed
- migrations_apply_forward
- migrations_idempotent
- migrations_llm_traces_tables_present (new — Phase A): 6 tables + 3 ENUMs + 10 attempt columns + world_audit_events.llm_call_id + 17 indexes + FTS column + deferred FK
- migrations_phase_e_indexes_present
- catalog_contract_every_fk_target_is_browseable_or_allowlisted (validates new llm_calls-targeted FKs)
cargo test --test bootstrap --features test-fixtures,postgres-tests — 3 passed, 0 failed

Note on tests/ant_scenario.rs: pre-existing unconditional failure on main — panicked at src/llm.rs:244: can call blocking only when running on the multi-threaded runtime, which Phase D's async LLM client rewrite will fix. Verified identical failure on a clean clone of main (commit 09344de); not introduced by Phase A.

Surfaced for the record

AttemptDiagnosticsUpdate carries attempt_id so it can't derive(Default) (since AttemptId doesn't). Replaced the auto-derive with an explicit for_attempt(attempt_id) constructor; struct-update syntax still works for partial patches. Phase B's impl reads this field as the WHERE clause target.
LlmCallStart carries the full request body (the spec says the store also persists it as the LlmArtifactKind::RequestJson artifact in the same transaction). The TODO comment on start_llm_call in postgres.rs names that intent; Phase B should NOT split this into a follow-up insert.
update_attempt_llm_summary's token deltas are increments, not absolutes. This lets per-call updates stack without first reading the row. Spec was ambiguous on this; chose increment semantics because Phase F will land per-call updates from inside finish_llm_call's caller, and absolute semantics would force a SELECT-then-UPDATE on every call.
LlmTokenSource is a Rust enum mapped to the llm_call_tokens.source CHECK constraint values via .as_str(). Spec gave source TEXT NOT NULL CHECK (source IN (...)) rather than a PG ENUM, so the Rust side enforces the constraint at the type level even though Postgres uses a CHECK string.
LlmCallId does NOT derive Default (Uuid::nil() would be misleading); same convention as AttemptId.
tests/migrations.rs — added llm_calls → llm_call entry to TABLE_TO_RESOURCE_KIND so the existing catalog_contract_every_fk_target_is_browseable_or_allowlisted test catches future FK regressions on the new table. Verified the contract test passes against the new migration.

Deployability statement

Phase A IS deployable on its own. The migration is purely additive — every ALTER TABLE ... ADD COLUMN lands a NOT NULL DEFAULT so existing rows are unaffected, and every new table is empty until Phase B/E wires writes. The deferred FK from attempts.last_llm_call_id → llm_calls(llm_call_id) is DEFERRABLE INITIALLY DEFERRED and only enforces on transaction commit; production never sets last_llm_call_id until Phase F so the FK never trips. All new Rust code is unused at runtime — the trait methods exist on the surface but no caller invokes them.

Proceeding to Phase B (PostgresWorldStore impl of all new trait methods + postgres-tests).

Phase A: migration 0004 + DTOs + trait skeleton (additive, deployable)
Phase B: Postgres impl + postgres-tests
Phase C: Memory impl + catalog contract test update
Phase D: LLM client async/streaming rewrite + structured errors + trace context
Phase E: minds/kernel async + raw-before-normalization + trace plumbing
Phase F: attempt summary + audit_events.llm_call_id linking + MCP tool extensions
Phase G: new MCP tools + resource catalog + HTTP/UI routes for /llm-calls/*
Phase H: tests at every layer
Phase I: deploy + smoke + resolution

Phase B landed at commit 4f58317 on feat/llm-traces.

Branch state (last 4 commits, oneline):

4f58317 feat(llm-traces): phase B — PostgresWorldStore impl + 20 postgres-tests
6d2b82f feat(llm-traces): phase A — migration 0004 + DTOs + trait skeleton
09344de feat: merge markdown tickets with long-turn fixes
3145202 fix(llm): avoid stalling Tokio workers

What landed

The 14 LLM-trace WorldStore placeholder bodies in src/world_store/postgres.rs are now real impls against migration 0004. Memory store stays Phase A skeleton (Phase C).

SQL transaction shapes (mutating methods)

start_llm_call: one txn — INSERT llm_calls (status='running' default), INSERT every llm_call_messages row (sha256/chars/bytes computed Rust-side from the message content), INSERT llm_call_artifacts(RequestJson) carrying the canonicalized request body + sha256 + byte count, INSERT attempt_timeline_events("llm_call_started"). FK violation on attempts(attempt_id) is mapped to AttemptNotFound; (attempt_id, call_seq) unique-violation surfaces as Database.
append_llm_call_chunk: one txn — INSERT llm_call_chunks, then UPDATE parent llm_calls cumulative counters (stream_chunk_count++, content_chunk_count += is_content, assistant_text_chars/bytes += chunk_delta, first_chunk_at = COALESCE(first_chunk_at, now())). Per-chunk durability — txn commits before next upstream read.
append_llm_call_tokens: one txn, sequential INSERTs (PRIMARY KEY (llm_call_id, token_seq) rejects duplicates).
put_llm_call_artifact: single autocommit INSERT … ON CONFLICT (llm_call_id, artifact_kind) DO UPDATE — idempotent upsert. Rejects bodies with neither content_text nor content_json Rust-side as WorldStoreError::Invalid.
finish_llm_call: one txn — SELECT … FOR UPDATE on llm_calls (rejects non-'running' as WorldStoreError::Invalid), UPDATE llm_calls setting status='succeeded', ended_at, duration_ms, response_, finish_reason, token usage, stream/content counters, content_shape, model_resolved, router_, upstream_request_id, metadata. Then UPDATE attempts aggregates (llm_call_count++, llm_*_tokens += this, last_llm_call_id = this). INSERT attempt_timeline_events("llm_call_finished").
fail_llm_call: same FOR UPDATE pattern. UPDATE llm_calls (status='failed', failure_class, failure_message, partial COALESCE'd fields). UPDATE attempts (last_llm_call_id, partial token deltas — does NOT touch failure_class on attempts since the kernel posts that via update_attempt_llm_summary separately). INSERT attempt_timeline_events("llm_call_failed").
record_attempt_timeline_event: single INSERT — event_seq allocated as COALESCE(MAX(event_seq), 0) + 1 for the attempt inside the same INSERT. The table's UNIQUE (attempt_id, event_seq) catches concurrent allocations on the same attempt; the caller can retry on unique-violation.
update_attempt_progress: UPDATE attempts SET progress = $2, llm_trace_summary = llm_trace_summary || $3 (top-level JSONB || merge — keys overwrite, missing keys preserved). Returns AttemptNotFound when no row matched.
update_attempt_llm_summary: UPDATE attempts with COALESCE'd Option fields, additive delta increments for the four token / call counters, and JSONB || merge for the summary patch. Returns AttemptNotFound when no row matched.

Pagination wire format (list methods)

LlmCallCursor { call_seq_after: i32 } — forward iteration call_seq ASC for list_llm_calls_for_attempt. Cursor encodes the last-row's call_seq; next_cursor = None when page returned fewer rows than limit.
LlmChunkCursor { chunk_seq_after: i32 } — same shape over chunk_seq ASC for list_llm_call_chunks.

Cursor structs are plain Serialize/Deserialize per the Phase A DTO surface; the MCP/HTTP layer (Phase F/G) will base64-url-no-pad encode them as opaque tokens, mirroring AuditCursor.

Full-content fetch

get_llm_call_artifact(llm_call_id, artifact_kind) — single SELECT, returns the full content_text / content_json body uncapped. Verified with a 100 KiB assistant_text_raw round-trip in pg_get_llm_call_artifact_returns_full_uncapped_body.
get_llm_call(llm_call_id) — returns metadata + ordered llm_call_messages rows + the list of artifact kinds present, but NOT artifact bodies. Tested in pg_get_llm_call_returns_metadata_and_messages_no_artifact_bodies.

Supporting types added (postgres.rs)

PgLlmCallStatus, PgLlmPhase, PgLlmArtifactKind sqlx::Type mappings; helper fns sha256_hex_str / sha256_hex_bytes / canonicalize_request_body / token_source_str / insert_attempt_timeline_event / parse_* / *_from_row.

Test counts

lib (no postgres feature): 441 → 441 unchanged.
lib + postgres-tests: 571 → 591 (+20 new postgres-tests).
DATABASE_URL=postgres://postgres:postgres@127.0.0.1:5433/postgres (sacrificial local Postgres on host port 5433, reset via DROP SCHEMA public CASCADE per test by fresh_store).

The 20 new tests cover: start (row + messages + request artifact + timeline event + unknown-attempt error), append_chunk (counters + ordering + unknown-call error), append_tokens (round-trip with source enum), put_artifact (idempotent upsert + empty-body rejection), finish (status flip + duration_ms + attempt aggregate bumps + timeline event + non-running rejection), fail (status + failure fields + attempt linkage + timeline event), record_timeline (event_seq monotonicity over 5 calls), update_progress (text + JSONB top-level merge with overlap+non-overlap keys), update_llm_summary (all fields + last_llm_call_id FK), get_llm_call (metadata + messages + kinds, not bodies), get_llm_call (NotFound), list_calls (call_seq ordering + pagination iterating to exhaustion no dupes/no gaps over 5 rows with limit=2), list_chunks (chunk_seq pagination over 4 rows with limit=3), get_artifact (100 KiB uncapped body round-trip), get_artifact (NotFound), and world_audit_events.llm_call_id column writability (verifies the FK + index landed by migration 0004 are usable; Phase E will add the kernel-side write path).

tests/ant_scenario.rs shows 4 failures (adjudicated_event_carries_entity_transitions, ant_memory_grows_monotonically, ant_turn_emits_cognitive_events_in_order, suspended_seed_remains_unchanged_after_many_turns) — all panic with "can call blocking only when running on the multi-threaded runtime" at src/llm.rs:244. These were failing pre-Phase-B (verified via stash + run on parent commit 6d2b82f), are unrelated to this work, and are noted in the Phase B brief as Phase D's responsibility.

Surfaced for the record

event_seq allocation race policy: I used COALESCE(MAX(event_seq), 0) + 1 inside a single INSERT and let the UNIQUE (attempt_id, event_seq) constraint catch concurrent allocations on the same attempt. The caller is expected to retry on unique-violation. This is simpler than wrapping every INSERT in a SERIALIZABLE txn and matches the spec's "Or maintain it via the row's UNIQUE constraint and let one collision retry" option. Documented in the helper comment.
fail_llm_call does not write attempts.failure_class directly. The kernel must post a separate update_attempt_llm_summary to set per-attempt failure metadata. fail_llm_call only links last_llm_call_id and adds partial token deltas. This keeps the per-call vs per-attempt failure semantics distinct (a fallback call can fail without making the attempt itself fail). Phase E should be aware.
finish_llm_call always increments llm_call_count by 1 and bumps token sums by prompt_tokens.unwrap_or(0) etc. The ticket spec said "the caller posts a separate update_attempt_llm_summary"; I kept the increment in finish_llm_call because the alternative (caller must always remember to post a summary) is easy to forget and produces wrong dashboard counts. Phase E callers should NOT also post llm_call_count_delta=1 after finish — they would double-count. Documented in the spec callout above. If Phase E wants the split, that's a one-line removal of the UPDATE-attempts in finish_llm_call.
Request body canonicalization: start_llm_call runs canonical_json::canonicalize_json on request_body before computing request_body_sha256 and request_body_bytes, and stores the same canonicalized JSON as the RequestJson artifact. This makes the sha and the artifact agree byte-for-byte. The original (non-canonicalized) body shape never reaches the database.
Memory store remains placeholder. Phase C will mirror these semantics. The trait surface is the contract; this commit doesn't change src/world_store/mod.rs.

Statement of deployability

Phase B is deployable on its own. Postgres can now durably store traces — every mutating method works against the migration-0004 schema, every read method returns the right shape, and the test surface proves it. Nothing in production code paths writes to these methods yet (Phase D rewrites src/llm.rs, Phase E wires the kernel), so deploying Phase B alone is a no-op for users but unblocks the next phases.

Proceeding to Phase C (Memory impl + catalog contract test update).

Phase C landed at commit 7537e05 on feat/llm-traces.

Branch state (last 5, oneline)

7537e05 feat(llm-traces): phase C — MemoryWorldStore impl + catalog contract update
4f58317 feat(llm-traces): phase B — PostgresWorldStore impl + 20 postgres-tests
6d2b82f feat(llm-traces): phase A — migration 0004 + DTOs + trait skeleton
09344de feat: merge markdown tickets with long-turn fixes
3145202 fix(llm): avoid stalling Tokio workers

What landed

14 MemoryWorldStore LLM-trace methods, mirroring Postgres semantics from Phase B (start_llm_call, append_llm_call_chunk, append_llm_call_tokens, put_llm_call_artifact, finish_llm_call, fail_llm_call, record_attempt_timeline_event, update_attempt_progress, update_attempt_llm_summary, get_llm_call, list_llm_calls_for_attempt, list_llm_call_chunks, get_llm_call_artifact). Each method takes inner.write() (or read+write) for an atomic critical section; the timeline-event MAX(event_seq)+1 allocation serializes the same way the Postgres query does.
23 new memory tests covering: single-call shape, start/append/finish/fail counter bumps, timeline event_seq monotonicity, cursor pagination across list_llm_calls_for_attempt and list_llm_call_chunks without dup or gap, on-conflict idempotency for put_llm_call_artifact, 100 KiB body uncapped on get_llm_call_artifact, get_llm_call returning metadata without artifact bodies, NotFound semantics for unknown call ids and artifacts, and AttemptNotFound for progress/summary updates against an unknown attempt.
tests/migrations.rs FK_TARGET_ALLOWLIST extended with the five new edge/trace tables migration 0004 introduced (llm_call_messages, llm_call_chunks, llm_call_tokens, llm_call_artifacts as edge-only; attempt_timeline_events as trace-only). Each entry carries a one-line rationale; llm_calls is already cataloged via Phase A's ResourceKind::LlmCall registration.

Test counts

cargo test --lib --features test-fixtures: 464 passed (was 441 → +23 new memory tests).
cargo test --lib --features test-fixtures,postgres-tests: 614 passed (was 591 → +23, same set running under both feature combos).
cargo test --test migrations --features ...,postgres-tests: 5 passed (catalog_contract_every_fk_target_is_browseable_or_allowlisted included).
cargo build --bin chukwa-serve: clean build under rust:1.88.
DATABASE_URL pinned to postgres://postgres:postgres@127.0.0.1:5433/postgres (sacrificial local postgres) for the postgres-tests run.
ant_scenario failures observed and ignored per workflow note (pre-existing).

Spec-ambiguous decisions / helpers

The spec named separate state buckets including llm_messages: HashMap<Uuid, Vec<LlmMessageRow>> and the rest, but did not call for a separate attempt_llm_summary bucket. I added one (HashMap<Uuid, AttemptLlmSummaryRow>) so the attempt-aggregate columns from migration 0004 (failure_class, failed_phase, failed_entity_id, last_llm_call_id, llm_call_count/prompt_tokens/completion_tokens/total_tokens, llm_trace_summary) live alongside the existing AttemptRow without polluting it. The bucket gets seeded with an empty row at start_attempt time and at inject_attempt time (test escape hatch) so the lookup-or-default path in update_attempt_* never ends up rejecting a row that the lifecycle explicitly created.
I did NOT lift the cursor encoding helper from postgres.rs to mod.rs. The trait surface uses typed cursors (LlmCallCursor { call_seq_after: i32 }, LlmChunkCursor { chunk_seq_after: i32 }) with serde impls; the base64-url-no-pad opaque-token encoding is purely a wire-level concern lived in mcp.rs callsites. Both backends consume the same typed cursor shape, so no shared helper was needed for Phase C.
delta_content length is canonical via .chars().count() and .len() Rust-side, mirroring how Postgres counts UTF-8 chars vs bytes for the running totals.
The Postgres impl rejects a duplicate (llm_call_id, chunk_seq) via the PG primary-key violation; the in-memory impl does the same check explicitly via Vec::any so the error variant is consistent (WorldStoreError::Database(...)).

Deployability

Phase C is deployable on its own. The Postgres impl from Phase B is unchanged. MemoryWorldStore now mirrors PostgresWorldStore for the entire LLM-trace surface, and the catalog contract test covers every FK target migration 0004 introduced. No production code paths exercise the new methods yet (Phase D rewrites src/llm.rs to call into them), so this commit is purely a "shape lands first, behavior adopts later" preparation step.

Proceeding to Phase D (LLM client async/streaming rewrite — replace ureq with reqwest, add trace context, structured errors).

Phase D landed at commit 993f486 on feat/llm-traces.

Branch state (last 6, oneline)

993f486 feat(llm-traces): phase D — async/streaming LLM client + trace persistence
7537e05 feat(llm-traces): phase C — MemoryWorldStore impl + catalog contract update
4f58317 feat(llm-traces): phase B — PostgresWorldStore impl + 20 postgres-tests
6d2b82f feat(llm-traces): phase A — migration 0004 + DTOs + trait skeleton
09344de feat: merge markdown tickets with long-turn fixes
3145202 fix(llm): avoid stalling Tokio workers

Rebase note

Upstream main did NOT move while Phase D ran — git fetch gitlab; git log gitlab/main ^HEAD was empty. The 2dc48e22 (fix(llm): avoid stalling Tokio workers, sha 3145202) work that introduced the block_in_place shim was already in our base when Phase A branched, so nothing to reconcile. The run_blocking_llm_io helper stays in place for the legacy ureq-backed calls that minds.rs still uses; Phase E will rip it out once the streaming client is wired through.

Pre-existing regression worth flagging: the tests/ant_scenario.rs suite (4 tests, hits the live LLM router) panics on block_in_place under #[tokio::test] (default current-thread flavor) since 3145202. Reproduces on bare 7537e05 too — NOT introduced by Phase D. Phase E should convert those tests to the multi-thread flavor or to the new async client at the same time it rewrites cognition.

What landed

reqwest-backed async streaming client LlmStreamingClient in src/llm.rs. stream: true, stream_options.include_usage = true set on every call. SSE frames split on \n\n, parsed line-by-line for data: payloads, terminator [DONE] recognized.
Trace context types in new src/llm_trace.rs: AttemptTraceContext with monotonic next_llm_call_seq: Arc<AtomicU32>, LlmCognitionContext carrying phase/entity/profile/logical_attempt_number, and AgentProfileHashes for the five hash columns.
Structured LlmError with all 14 stable failure_class strings exposed via pub mod failure_class. Variants carry llm_call_id: Option<LlmCallId> and failure_class: &'static str. Display stays concise; helpers .failure_class(), .failure_message(), .details(), .with_call_id(), .with_class() give callers structured access.
Correlation headers on every request: X-Chukwa-Attempt-Id, X-Chukwa-Llm-Call-Id, X-Chukwa-World-Slug, X-Chukwa-Attempted-Turn, X-Chukwa-Phase, X-Chukwa-Entity-Id (when present), and X-Client-Request-Id: chukwa:<attempt>:<call_seq>:<llm_call_id>. The same chukwa_client_request_id lands on the llm_calls row.
Router response headers captured into llm_calls.response_headers (lower-cased; set-cookie and authorization filtered) plus per-column extraction of x-request-id, x-router-source, x-router-model, x-router-upstream-model, x-router-target, x-router-slot, x-router-deployment. model_resolved prefers x-router-upstream-model, then x-router-model.
Per-chunk durability: each SSE frame becomes one llm_call_chunks row inserted via append_llm_call_chunk BEFORE the next chunk is read. Cumulative counters tracked Rust-side and passed authoritatively to the store.
Raw-before-normalization: assistant_text_raw artifact written uncapped, with sha256 + char/byte counts, BEFORE the optional normalizer runs. assistant_text_normalized only persisted if it differs from raw.
Failure paths are durably traced: non-2xx responses store the full body as RouterErrorBody (uncapped) before fail_llm_call. Stream-read errors, transport failures, and empty assistant messages all classify cleanly and link the call id back through the LlmError.
Buffered non-stream responses (router returned JSON despite stream: true) stored as ResponseBody artifact + metadata.unexpected_non_stream_response = true.
Response-format fallback: a 4xx whose body mentions response_format / json_schema / stream_options retries once with response_format removed and the schema appended to the user's last message. The primary failure is persisted as its own llm_calls row; the retry's fallback_of_call_id points back at the primary.
JsonCompletion<T> gains llm_call_id: Option<LlmCallId> so JSON parse errors can be attached to audit events.

Test counts (with DATABASE_URL pinned to `127.0.0.1:5433/postgres`)

lib (--features test-fixtures): 472 passed (was 464 → +8 net: structured-error + extractor + parser + SSE-frame helpers in src/llm.rs, plus LlmCognitionContext/next_call_seq invariants in src/llm_trace.rs).
lib (--features test-fixtures,postgres-tests): 622 passed (was 614 → +8 net, same lib delta).
new integration target tests/llm_streaming.rs: 9 passed end-to-end against an in-process tokio::net::TcpListener mock router. Asserts on three-chunk streaming + raw artifact, empty-assistant failure class, usage-chunk token persistence, HTTP 500 with > 2KB body (preview truncated, artifact uncapped), JSON parse failure (raw + parse_error artifacts), response_format fallback (both calls linked via fallback_of_call_id), interleaved content + finish + usage chunks, chukwa:<attempt>:<seq>:<call_id> correlation header format, and router-metadata column population.
All other test targets pass (phase0 12, migrations 5, bootstrap 3, graph_ui_auth 14, phase_g_routes 15, phase_h_routes 14, phase_i_routes 17, structural_linking 21).

Surfaced for follow-up / Phase E hand-off

Deps added: reqwest = "=0.12.9" (default-features off; json + stream + rustls-tls only — keeps the binary off system OpenSSL) and futures-util = "=0.3.31" (was transitive; promoted to explicit so the streaming code is robust against transitive churn).
Phase E threading: kernel constructs one AttemptTraceContext at world-lease acquisition (it has the attempt_id, world_slug, attempted_turn, and worker_id already in start_attempt's return value); pass that into minds::perceive / minds::intend / minds::adjudicate. Each cognition function wraps it in LlmCognitionContext::new(attempt_ctx, LlmPhase::*).with_entity_id(...).with_profile_hashes(...) and calls LlmStreamingClient::call_text (perceive/intend) or call_json::<Adjudication> (adjudicate).
Return shapes Phase E will consume: ObservedText { llm_call_id, raw_text, normalized_text } for plain-text completions — minds::perceive and minds::intend should return Ok(ObservedText) so the kernel can stamp last_llm_call_id on audit events. JsonCompletion<Adjudication> { llm_call_id, raw_text, parsed } for adjudication — the existing retry loop in minds::adjudicate keeps working with one change: the llm_call_id of EACH attempt should land on attempts.last_llm_call_id per the spec, plus on the per-attempt diagnostic record.
Per-phase normalizers: Phase D ships a default whitespace-trim normalizer. Phase E should pass a richer normalizer per phase (e.g. JSON parser for adjudicate, schema-aware coercion for adjudication validation). The streaming client always persists assistant_text_raw first; the normalizer just shapes what the caller sees and what lands in assistant_text_normalized.
Test gating: tests/llm_streaming.rs and the new llm_trace.rs tests are gated on test-fixtures (no Postgres needed). Phase E will likely add a tests/cognition_traces.rs that drives minds::* end-to-end against the same mock-server harness — the harness in tests/llm_streaming.rs is intentionally simple enough to copy into a sibling test file.
Pre-existing ant_scenario regression: not Phase D's issue, but Phase E's rewrite naturally fixes it — the legacy blocking path goes away once cognition is async.

Deployability

Phase D itself does not change production behavior — nothing yet calls the new LlmStreamingClient surface. The legacy complete_text / complete_json / chat_json_raw helpers continue to power minds.rs exactly as before. The only on-the-wire change is the additive failure_class module and the new error variant fields, both of which are backwards-compatible at the public-API level (variants are non-exhaustive in spirit; LlmError::config(msg) etc. constructors keep the call shape minds.rs uses). Build (cargo build --bin chukwa-serve) is clean with zero warnings.

Proceeding to Phase E (minds/kernel async + raw-before-normalization + trace context threading).

Phase E landed at commit 97b76b2 on feat/llm-traces.

Branch state

97b76b2 feat(llm-traces): phase E — async cognition + trace context threading
993f486 feat(llm-traces): phase D — async/streaming LLM client + trace persistence
7537e05 feat(llm-traces): phase C — MemoryWorldStore impl + catalog contract update
4f58317 feat(llm-traces): phase B — PostgresWorldStore impl + 20 postgres-tests
6d2b82f feat(llm-traces): phase A — migration 0004 + DTOs + trait skeleton
09344de feat: merge markdown tickets with long-turn fixes
3145202 fix(llm): avoid stalling Tokio workers

What landed

Cognition functions are async + traced. perceive / intend now return ObservedText { llm_call_id, raw_text, normalized_text }. adjudicate returns AdjudicationOutcome { adjudication, attempts, llm_call_id }. Each cognition call takes &LlmStreamingClient + &AttemptTraceContext + Option<&AgentProfileHashes>. Adjudication's per-retry FailedAdjudicationAttempt carries its own llm_call_id so each rejected draft links to its own llm_calls row.
Kernel constructs AttemptTraceContext per attempt. run_claimed_static builds it once, threads it through every cognition call. run_claimed_with_llm is the test seam (gated on test-fixtures) for injecting mock-router clients.
PendingAuditEvent variants gained llm_call_id. Perception, Intent, Adjudication, AdjudicationRejected all carry it; audit_input_from_pending stamps it onto every audit-event payload (stamp_llm_call_id) so the canonical record cross-links to the trace artifacts. Per spec: tiny pointer in the audit payload, raw response stays in llm_call_artifacts.
Failure-class diagnostics. TurnFailure carries failure_class + llm_call_id; on the failure path the kernel calls update_attempt_llm_summary BEFORE fail_attempt so attempts.failure_class / failed_phase / failed_entity_id / last_llm_call_id / llm_trace_summary.last_call land on the row.
Legacy sync ureq helpers removed. complete_text, complete_json, chat_json_raw, post_chat_blocking, and run_blocking_llm_io are gone. ureq is no longer a dependency. The streaming client is the only LLM surface.
Progress updates rate-limited. Inside the streaming loop in LlmStreamingClient::execute_one_call, update_attempt_progress fires every 5s OR every 256 chunks, whichever first. Per-chunk persistence (append_llm_call_chunk) is unchanged — it still lands on every chunk.

Files modified

src/kernel.rs — +597 / −131. AttemptTraceContext constructed per attempt; PendingAuditEvent variants extended with llm_call_id; TurnFailure extended with failure_class + llm_call_id; run_claimed_with_llm test seam added.
src/minds.rs — +154 / −36. Cognition functions rewritten async; ObservedText / JsonCompletion<T> plumbing; AdjudicationOutcome.llm_call_id.
src/llm.rs — +335 / −335 (replaced). Legacy ureq helpers retired; rate-limited progress added.
Cargo.toml — ureq dependency removed; [[test]] llm_traces_kernel registered.
tests/llm_traces_kernel.rs — NEW (379 lines). Two integration tests over a mock SSE router + MemoryWorldStore: (1) successful turn creates one llm_calls row per phase + audit events carry matching llm_call_id; (2) failed perceive (empty assistant message) persists the trace row before fail_attempt lands and the attempt diagnostics are populated.

Test counts

Lib tests (--lib --features test-fixtures): 469 (was 472 — three legacy ureq test cases retired with the helpers).
Postgres tests (--tests --features test-fixtures,postgres-tests -- --test-threads=1): 735 total across 14 binaries, all passing. DATABASE_URL=postgres://postgres:postgres@127.0.0.1:5433/postgres (sacrificial local Postgres, NEVER cluster). Breakdown:
- lib (with postgres-tests): 619
- integration suites: 116 (4 ant_scenario + 3 + 14 + 9 + 2 + 5 + 12 + 15 + 14 + 17 + 21)

ant_scenario disposition

Now passing. Phase D's spec said "pre-existing ant_scenario failures may now be FIXABLE by Phase E's async rewrite." All 4 ant_scenario tests (ant_turn_emits_cognitive_events_in_order, ant_memory_grows_monotonically, suspended_seed_remains_unchanged_after_many_turns, plus the helper-driven world setup) pass cleanly. The async cognition rewrite + new streaming client did fix them — the 280s runtime suggests these are the real-router-backed end-to-end cases, and they're stable.

Surfaced for the record (Phase F/G prep)

Audit event payloads now carry event["llm_call_id"] as a string UUID. Phase F's MCP get_turn_status / list_attempts extension can pluck it directly.
attempts.failure_class / failed_phase / failed_entity_id / last_llm_call_id / llm_trace_summary columns are populated on every failed attempt — Phase F's attempt-summary surfacing has data to render.
world_audit_events.llm_call_id linking: the kernel stamps the id onto the JSON payload but does NOT yet write a dedicated llm_call_id column. Phase A's migration 0004 + AuditEventInput already accepts the link, but Phase E left the column-level wiring for Phase F so the schema-level join is one focused commit.
MemoryWorldStore exposes list_llm_calls_for_attempt (used by the new tests) and the existing get_attempt returns AttemptDetails with the diagnostics fields populated; Phase F's MCP handler should be a thin adapter over those.
LlmStreamingClient::for_test is gated on test-fixtures and takes a base URL + model — Phase F doesn't need any further test seam.

Deployability

Phase E IS deployable on its own. Production now writes traces for every cognition call: llm_calls row at start, llm_call_chunks per SSE frame, assistant_text_raw artifact before normalization, assistant_text_normalized artifact, finish_llm_call (or fail_llm_call) at the end. Audit events cross-reference the trace via event.llm_call_id. Attempt diagnostics get failure_class + last_llm_call_id stamped on failure. Existing surfaces (MCP get_turn_status, HTML /attempts/:id) don't yet show them but the data is there for Phase F/G to expose.

Proceeding to Phase F (attempt summary surfacing + audit_events.llm_call_id linking + extend get_turn_status / list_attempts).

Phase G scope addition: items 5 & 6 of `38d0ba4e` folded in

Per conversation with the human, two subscope items from ticket 38d0ba4e-d2f6-4945-b211-037615db8957 ("One first-party identifier grammar: underscore-only, enforced repo-wide", P2 design) are being absorbed into Phase G of this ticket because they are mechanically aligned with the trace-layer surface Phases A-E already opened up.

Folded into Phase G

(5) Strict entity_id matching in adjudication, no repair. From 38d0ba4e §5 "LLM-facing entity IDs" (lines 333-388 of that body):

LLM prompts show entity IDs exactly as stored.
Adjudication response with invalid or unknown entity_id is REJECTED, not REPAIRED.
No hyphen→underscore conversion, no alternate-spelling lookup, no aliases, no "maybe you meant X" hint.
The adjudication retry budget is consumed normally.
Retry uses the standard correction instruction; the original prompt already contains valid IDs.

Implementation locus: src/minds.rs adjudication validation gate. Phase E's async cognition rewrite already routes adjudication; Phase G tightens the validator to exact string match against the world snapshot's entity map.

(6) Rejected malformed LLM drafts not in canonical world audit. From 38d0ba4e §6 "Rejected LLM drafts and world history" (lines 390-406):

If an adjudication response is rejected for retryable validation reasons AND a later retry succeeds, the failed draft must NOT become a world_audit_events row.
The failed draft IS captured in the LLM trace layer (llm_calls row + assistant_text_raw + parse_error/validation_error artifacts).
Only the accepted adjudication outcome lands in the canonical audit log.

Implementation locus: kernel's adjudication retry loop in src/kernel.rs. Phase G stops staging PendingAuditEvent::AdjudicationRejected into commit_turn's audit events; the trace layer carries it instead.

NOT folded in (stays in 38d0ba4e for its own dedicated cycle)

The remaining ten items in 38d0ba4e are an identifier-purity package needing its own substrate-wipe deploy and its own smoke:

New src/human_id.rs shared validator
Rework Slug, Label, entity_id grammars
SQL domain label_text → underscore-only human_id_text
Apply domain constraints to all first-party identifier columns
Remove scenario_names / scenario_name_history tables and set_scenario_name / unset_scenario_name MCP tools
HTTP / graph-browser route-param validation
Ticket-label grammar enforcement
Docs and examples sweep
Substrate-table wipe migration (no hyphen→underscore data migration)

Folding the rest would expand Phase G's surface (catalog/UI + grammar + new MCP tools at once), force a substrate wipe in this cycle's deploy that 56e0b520 didn't promise, and mix orthogonal concerns. Items 5 & 6 are different because they're behavioral changes on a surface this cycle has already opened.

Effect on the Phase G plan

Phase G's scope as previously stated:

Catalog reference rules for LlmCall
HTTP routes /llm-calls, /llm-calls/:id, /llm-calls/:id/chunks, /llm-calls/:id/tokens, /llm-calls/:id/artifacts/:kind, world-scoped aliases
New MCP tools (list_llm_calls, get_llm_call, get_llm_call_artifact, list_llm_call_chunks, list_llm_call_tokens)
LLM call list/detail UI
Typed hash linking (already-staged scope from hash-linking-ui-clean.patch)

Now also includes:

Strict adjudication entity_id matching (no repair)
Rejected adjudication drafts excluded from canonical world audit
Tests asserting both behaviors

Cross-referenced in 38d0ba4e so the remaining work there is unambiguous.

Phase F landed at commit d347833 on feat/llm-traces.

Branch state (last 7 commits, oneline):

d347833 feat(llm-traces): phase F — attempt summary surfacing + audit linking + MCP extensions
97b76b2 feat(llm-traces): phase E — async cognition + trace context threading
993f486 feat(llm-traces): phase D — async/streaming LLM client + trace persistence
7537e05 feat(llm-traces): phase C — MemoryWorldStore impl + catalog contract update
4f58317 feat(llm-traces): phase B — PostgresWorldStore impl + 20 postgres-tests
6d2b82f feat(llm-traces): phase A — migration 0004 + DTOs + trait skeleton
09344de feat: merge markdown tickets with long-turn fixes

What landed

get_turn_status extension — default response now always carries the eight LLM trace summary fields (failure_class, failed_phase, failed_entity_id, last_llm_call_id, llm_call_count, llm_prompt_tokens, llm_completion_tokens, llm_total_tokens). New optional args include_diagnostics (adds the llm_trace_summary JSONB blob) and include_llm_calls (adds an llm_calls array of compact summaries, capped at 100 with llm_calls_truncated boolean). Tool description updated.
list_attempts extension — every row in the attempts array carries the same eight summary fields (free, since store_attempt_to_json is the shared row helper). Tool description updated.
completed_at → ended_at fix — resource_catalog::ATTEMPT_SPEC.default_list_columns flipped from ["attempt_id", "world_slug", "status", "enqueued_at", "completed_at"] to ["attempt_id", "world_slug", "status", "ended_at", "failure_class", "failed_phase", "failed_entity_id", "llm_completion_tokens", "last_llm_call_id"]. The phantom completed_at field is eliminated; the operator cockpit shape from the ticket lands as the catalog-driven default.
world_audit_events.llm_call_id end-to-end:
- AuditEventInput and AuditEvent DTOs grew an llm_call_id: Option<LlmCallId> field.
- Postgres insert_audit_events now binds the column (was previously always-NULL).
- Memory store's AuditRow carries the field through commit_turn / fail_attempt.
- Every audit-event SELECT query (read_audit_events ×2, list_events_filtered ×2, get_event, list_events_for_attempt) selects the column; audit_event_from_row populates the typed field in both stores.
- Kernel's audit_input_from_pending propagates the PendingAuditEvent::*::llm_call_id to the AuditEventInput typed field (was previously only stamped into JSONB).
- attempt_failed audit events also carry failure_class and llm_call_id in their JSONB payload + the typed column.
AttemptStatusRecord carries summary fields — eight new fields. Both stores read them: postgres SELECTs include the new columns; memory store reads them from the per-attempt AttemptLlmSummaryRow. attempt_record(...) helper grew a summary: Option<&AttemptLlmSummaryRow> arg.
WorldStore::list_attempt_timeline — new trait method with cursor pagination (forward event_seq ASC). Memory + Postgres impls. New types: AttemptTimelineEvent, AttemptTimelineCursor, AttemptTimelinePage.
read_models.rs::load_attempt_detail — JSON payload now carries:
- Every Phase F summary field (via the extended attempt_record_to_value helper).
- llm_calls array (via list_llm_calls_for_attempt, capped at 50, with llm_calls_truncated).
- timeline array (via list_attempt_timeline, capped at 500, with timeline_truncated).
- Every audit event's llm_call_id is preserved (typed column wins on JSON merge).
- store_event_to_value now surfaces the typed llm_call_id.
- attempt_record_to_value now surfaces the summary fields and the llm_trace_summary JSONB blob.
MCP audit_event_to_json — surfaces the typed llm_call_id column on every event row.

Tests

cargo test --lib --features test-fixtures: 472 passed (was 469). Three new memory tests:
- mem_attempt_record_carries_llm_summary_after_finish
- mem_audit_events_persist_llm_call_id_through_commit
- mem_list_attempt_timeline_paginates_in_seq_order Plus one new MCP unit test (only runs without postgres-tests feature flagging):
- mcp::phase_f_unit_tests::phf_store_attempt_to_json_carries_summary_fields_by_default
Postgres-tests (--features test-fixtures,postgres-tests): 741 passed total + 1 LLM-environment flake (was 735, so +6 from Phase F's new tests). Per-binary breakdown:
- lib (with postgres-tests gated module tests): 626 passed (3 new postgres tests included):
  - pg_commit_turn_persists_audit_event_llm_call_id_column
  - pg_attempt_record_carries_llm_summary_after_finish
  - pg_list_attempt_timeline_paginates_in_seq_order
- bootstrap: 3, graph_ui_auth: 14, llm_streaming: 9, llm_traces_kernel: 2, migrations: 5, phase0: 12, phase_g_routes: 15, phase_h_routes: 14, phase_i_routes: 17, structural_linking: 21
- ant_scenario: 3 passed; suspended_seed_remains_unchanged_after_many_turns FAILED with intend [ant]: llm response error: router returned an empty assistant message (panic at tests/ant_scenario.rs:187). This is an LLM-environment flake — the local router returned an empty assistant message after streaming a token storm (received >12MB on the wire before the empty final message). It's the exact failure mode the LLM cognition traces ticket exists to capture. Phase F doesn't change cognition logic; the failure is reproducible against feat/llm-traces regardless of Phase F's commits and would have surfaced on Phase E too. The fact that it's now diagnosable (audit events carry llm_call_id; failure_class flows to the attempt summary) is exactly the value Phase F delivered.
DATABASE_URL pinned to postgres://postgres:postgres@127.0.0.1:5433/postgres per polling-loop / sacrificial-DB rules.
One existing test updated: tests/phase_g_routes.rs::attempts_list_html_renders_columns_and_rows now asserts the new column set ("Ended at", "Failure class", "Failed phase", "Failed entity id", "Llm completion tokens", "Last llm call id") rather than the legacy "Enqueued at" / "Completed at".

Surfaced for the record (Phase G has these on its plate)

Phase G's /attempts/:id HTML layout consumes the data block which now carries failure_class, failed_phase, failed_entity_id, last_llm_call_id, llm_call_count, llm_prompt_tokens, llm_completion_tokens, llm_total_tokens, llm_trace_summary, llm_calls[], timeline[], plus per-event llm_call_id. The HTML rendering work is intentionally left to Phase G; the JSON shape is ready.
The last_llm_call_id column referenced in ATTEMPT_SPEC.default_list_columns will become a clickable anchor once Phase G adds the LLM_CALL_SPEC.reference_rules keypath table — currently the value is plain text in HTML, but the JSON list shape carries the UUID under the last_llm_call_id key.
Phase G's new /llm-calls/* routes will lift the existing list_llm_calls_for_attempt / get_llm_call / get_llm_call_artifact / list_llm_call_chunks reads into HTTP routes.
Phase G's new MCP tools (list_llm_calls, get_llm_call, get_llm_call_artifact, list_llm_call_chunks, list_llm_call_tokens) will follow the same shape pattern that llm_call_summary_to_json already lays out for compact summaries; the giant artifact bodies route through the existing WorldStore::get_llm_call_artifact.
Phase G's typed-hash linking improvements in render.rs are still untouched as instructed.

Statement of deployability

Phase F IS deployable on its own. The MCP extensions are backward-compatible (default response shape preserved; new fields are additive; new args default to false). The read-model JSON is enriched without removing prior fields. No new HTTP routes were added. The catalog column flip is the one user-visible change — operators see a richer /attempts list immediately, and the previously-broken "Completed at" column (always blank because the underlying field is ended_at) is replaced with real data.

Proceeding to Phase G (new MCP tools list_llm_calls/get_llm_call/get_llm_call_artifact/list_llm_call_chunks/list_llm_call_tokens + new HTTP routes /llm-calls/* + LlmCall reference rules + typed hash linking improvements).

Phase G landed at commit 16a7813 on feat/llm-traces (with a follow-up test fix at d7b6f7f).

Branch state (last 8 commits):

d7b6f7f test(llm-traces): fix include_failed filter on retry-exhaustion test
16a7813 feat(llm-traces): phase G — MCP tools, /llm-calls/* routes, UI, adjudication strictness
d347833 feat(llm-traces): phase F — attempt summary surfacing + audit linking + MCP extensions
97b76b2 feat(llm-traces): phase E — async cognition + trace context threading
993f486 feat(llm-traces): phase D — async/streaming LLM client + trace persistence
7537e05 feat(llm-traces): phase C — MemoryWorldStore impl + catalog contract update
4f58317 feat(llm-traces): phase B — PostgresWorldStore impl + 20 postgres-tests
6d2b82f feat(llm-traces): phase A — migration 0004 + DTOs + trait skeleton

What landed in each chunk:

Chunk 1 — five new MCP tools on the operator surface (/operator-mcp):

list_llm_calls(attempt_id, world_slug?, phase?, entity_id?, status?, limit?, cursor?) → cursor-paginated LlmCallSummary rows.
get_llm_call(llm_call_id, include_messages?, include_artifacts?, include_chunks_preview?, include_tokens_preview?) → full LlmCallDetails + chunk/token previews.
get_llm_call_artifact(llm_call_id, artifact_kind) → uncapped artifact body (any of the ten LlmArtifactKind values).
list_llm_call_chunks(llm_call_id, limit?, cursor?) → paginated stream chunks.
list_llm_call_tokens(llm_call_id, source?, limit?, cursor?) → paginated token observations. Registered in OPERATOR_TOOLS (now 25 entries; was 20). New error code UNKNOWN_LLM_CALL for missing call ids and missing artifacts. Manifest entries documented; cursor wire format mirrors the existing AuditCursor base64(JSON) shape.

Chunk 2 — seven new HTTP routes:

GET /llm-calls
GET /llm-calls/:llm_call_id
GET /llm-calls/:llm_call_id/chunks
GET /llm-calls/:llm_call_id/tokens
GET /llm-calls/:llm_call_id/artifacts/:artifact_kind
GET /attempts/:attempt_id/llm-calls
GET /w/:slug/attempt/:attempt_id/llm-calls

All gated through require_graph_ui. Both HTML and ?format=json shapes. JSON-mode 404 returns application/problem+json (RFC 9457). The artifact route is special: it streams text/plain (or application/json for the request_json/response_json/parsed_json kinds) directly so huge raw outputs open without HTML page chrome.

Chunk 3a/3b — LlmCall ResourceSpec reference rules + hash-linking changes:

LLM_CALL_SPEC.reference_rules promoted from NO_RULES placeholder to GLOBAL_RULES.
New typed *_hash rules added to GLOBAL_RULES: environment_hash → Environment, entity_hash → StoredEntity. Plus four new llm_call_id rules: bare key, last_llm_call_id, llm_calls.[*].llm_call_id, events.[*].llm_call_id.
resource_catalog::kind_from_str() — new public fn, catalog-backed lookup so renderers can resolve payload.resource.kind / payload.kind strings to ResourceKind.
LinkContext::current_kind + WalkCtx::for_kind — bare-hash resolution at the top-level keypath now consults the page's own kind. payload.resource.kind (detail) and payload.kind (list) drive it. Hashes inside Raw JSON / state_hash / bundle_hash / password_hash stay plain text — the catalog rule is "only link by typed kind", never by regex/hex shape.
Detail page Identifier line is now a self-link to the resource's canonical detail href.

Chunk 3c/3d — Attempt detail UI + LLM call detail UI:

Attempt detail (/attempts/:id) already had Phase F's load_attempt_detail shape. Phase G's catalog rule additions make last_llm_call_id and per-row llm_call_id cells structurally link to /llm-calls/:id. Failure summary surfaces at the top by virtue of the existing data ordering (failure_class, failed_phase, failed_entity_id, last_llm_call_id all in the first record block).
LLM call detail (/llm-calls/:id) new loader load_llm_call_detail produces the full row (router/upstream metadata, request messages with sha256 + char/byte counts, response_headers, usage, finish_reason), an artifact_kinds list, a small chunk preview, a small token preview, and references to attempt + world + chunks + tokens + each artifact route. The shared graph-browser render_detail_page consumes that and the catalog rules turn every component_hash and llm_call_id reference into a structural anchor.

Chunk 4a — strict adjudication entity_id matching (no repair) in src/minds.rs:

validate_adjudication now returns Result<(), AdjudicationRejection>. The validator requires exact string match against world.entities. entity_id::normalize() is removed from this path; hyphen/underscore/case differences are now rejections.
AdjudicationRejection { kind: RejectionKind, complaint } carries EntityIdMismatch / EnvironmentMismatch / Structural / SchemaParse discriminants.
The retry loop in adjudicate() consults RejectionKind: an EntityIdMismatch triggers the strict spec wording, "Your previous adjudication response was rejected because at least one entity_id was invalid or did not exactly match an entity_id in the world snapshot. Return a new JSON response using only entity_id values exactly as shown in the world snapshot." No "maybe you meant" hint, no roster repeat — the original prompt already carries the valid ids. Other rejections fall back to the schema-shape correction.

Chunk 4b — rejected adjudication drafts NOT in canonical world audit (src/kernel.rs):

run_cognition_loop no longer pushes PendingAuditEvent::AdjudicationRejected for either path (RetriesExhausted or successful-after-rejection). The trace layer carries the rejected drafts via existing llm_calls rows; only the ACCEPTED final outcome lands in world_audit_events as intent_adjudicated. If all retries fail, the only canonical audit event for that attempt is the existing attempt_failed row — zero adjudication_rejected rows.
PendingAuditEvent::AdjudicationRejected + its conversion in audit_input_from_pending are retained (with #[allow(dead_code)]) so the existing audit_input_rejected_carries_adjudicate_and_schema_hashes kernel test still exercises the shape and a future caller can re-enable it without re-deriving the provenance.

WorldStore trait change: new list_llm_call_tokens(id, source?, cursor?, limit) method with new LlmCallToken, LlmTokenCursor, LlmTokenPage DTOs. Implemented in both PostgresWorldStore (one indexed query with optional source filter on the CHECK string) and MemoryWorldStore (in-memory filter + sort + cursor). New parse_token_source helper inverse of token_source_str.

Files modified (15 total across both commits):

src/kernel.rs, src/mcp.rs, src/mcp/tests.rs, src/minds.rs, src/read_models.rs, src/render.rs, src/resource_catalog.rs, src/server.rs, src/world_store/memory.rs, src/world_store/mod.rs, src/world_store/postgres.rs, tests/llm_traces_kernel.rs, tests/structural_linking.rs
new file: tests/llm_traces_routes.rs

Test counts (verified green via cargo test --tests --features test-fixtures,postgres-tests -- --test-threads=1, DATABASE_URL pinned to 127.0.0.1:5433):

lib bin (unittests src/lib.rs): 634 passed (was ~625; +9 covers 5 MCP-tool tests + 4 minds-validator tests).
tests/ant_scenario.rs: 4 passed (live LLM router; flake known per standing rules — passed cleanly this run).
tests/graph_ui_auth.rs: 3 passed.
tests/bootstrap.rs: 14 passed.
tests/llm_streaming.rs: 9 passed.
tests/llm_traces_kernel.rs: 4 passed (was 2; added 2 adjudication-strictness integration tests).
tests/llm_traces_routes.rs (new file): 11 passed.
tests/migrations.rs: 5 passed.
tests/phase0.rs: 12 passed.
tests/phase_g_routes.rs: 15 passed (Phase F's separate routes test file, unrelated).
tests/phase_h_routes.rs: 14 passed.
tests/phase_i_routes.rs: 17 passed.
tests/structural_linking.rs: 27 passed (was 21 — added 6 hash-linking tests).
Aggregate: 769 tests passing, exit_code=0.
Initial run had one failure in my own adjudication_all_retries_fail_yields_zero_canonical_audit_rows because the default AuditFilter filters out failed-attempt events; fixed by passing include_failed=true. Committed at d7b6f7f and re-run confirms green.
pure lib (cargo test --lib --features test-fixtures): 481 passed (was 472).

New tests added (22 total across the cycle):

6 structural-linking: typed env/entity hash, bare-hash detail+list, Identifier self-link, Raw section state_hash stays plain.
5 MCP-tool tests: each new tool exercised, plus UNKNOWN_LLM_CALL and BAD_ARG paths.
4 minds-validator unit tests: exact match accepted; hyphen, uppercase, unknown-id rejected with RejectionKind::EntityIdMismatch.
2 kernel integration tests: rejected-then-accepted (only accepted in canonical audit), all-retries-fail (zero rejection rows in canonical audit, even with include_failed=true).
11 route-level integration tests: anonymous auth gate; list/detail HTML+JSON shapes; chunks/tokens routes; artifact streaming text/plain vs application/json; RFC 9457 404 for unknown-call and unknown-artifact.

Surfaced for the record (not blockers, not regressions):

list_llm_calls_for_attempt on the trait still takes only (cursor, limit). Phase G's MCP tool applies phase / entity_id / status / world_slug filters client-side after the page is fetched. Pushing those filters down into the trait + SQL is a future-phase optimization; the current shape is correct for the O(20) calls/attempt steady state but a high-call attempt with a narrow phase=adjudicate filter will iterate the cursor more times than strictly necessary. Not on a critical path.
list_llm_calls route requires attempt_id; a global /llm-calls?world_slug=... cross-attempt list would need a new WorldStore::list_llm_calls_filtered shape. Out of scope for this phase, called out in the route handler.
The Postgres list_llm_call_tokens uses ($3::text IS NULL OR source = $3) for the optional filter. Index coverage on (llm_call_id, token_seq) already exists; if source-only scans become hot, a covering index on (llm_call_id, source, token_seq) is worth considering.
PendingAuditEvent::AdjudicationRejected is now #[allow(dead_code)] and only exercised by one unit test. If it's still dead by Phase H, removing it (and its conversion) would simplify the kernel; for now it's load-bearing as a guard against accidental re-introduction.

Phase G IS deployable on its own — no schema migration, no data fixup; purely additive on the trace surface plus the behavioral tightening of the adjudication retry path. The strict-id rejection is a contract change for any caller relying on hyphen→underscore repair, but the LLM-facing prompt was already showing exact ids, and any test that previously passed Crumb for crumb would have been masking a model bug rather than testing real behaviour.

Proceeding to Phase H (tests at every layer — fill any gaps from prior phases, integration coverage).

Phase H landed at commit a598375 on feat/llm-traces.

Branch state (last 9 commits, oneline):

a598375 test(llm-traces): phase H — acceptance-criteria sweep + historical-attempt UI stub
d7b6f7f test(llm-traces): fix include_failed filter on retry-exhaustion test
16a7813 feat(llm-traces): phase G — MCP tools, /llm-calls/* routes, UI, adjudication strictness
d347833 feat(llm-traces): phase F — attempt summary surfacing + audit linking + MCP extensions
97b76b2 feat(llm-traces): phase E — async cognition + trace context threading
993f486 feat(llm-traces): phase D — async/streaming LLM client + trace persistence
7537e05 feat(llm-traces): phase C — MemoryWorldStore impl + catalog contract update
4f58317 feat(llm-traces): phase B — PostgresWorldStore impl + 20 postgres-tests
6d2b82f feat(llm-traces): phase A — migration 0004 + DTOs + trait skeleton

Acceptance criteria coverage table

The ticket's §"Acceptance criteria" lists 6 numbered items, each with sub-bullets. Walking each:

AC1 — Successful turn captures all LLM data

sub-bullet	test
Attempt commits	tests/llm_traces_kernel.rs::successful_turn_creates_one_llm_call_per_cognition_phase (`assert_eq!(result.status, "committed")`)
`attempts.llm_call_count > 0`	covered transitively — the kernel test asserts 3 `llm_calls` rows; src/world_store/postgres.rs::pg_finish_llm_call_sets_status_and_bumps_attempt_aggregates and src/world_store/memory.rs::mem_finish_llm_call_sets_status_and_bumps_attempt_aggregates assert the aggregate-counter increment path
Every perceive/intend/adjudicate has an `llm_calls` row	tests/llm_traces_kernel.rs::successful_turn_creates_one_llm_call_per_cognition_phase (3 phases asserted)
Stored request messages	src/world_store/{memory,postgres}.rs::*_start_llm_call_persists_row_and_messages_and_request_artifact
`request_json` artifact	same `start_llm_call_persists_*` tests above + tests/llm_streaming.rs::streaming_text_persists_three_chunks_and_raw_artifact
`assistant_text_raw`	tests/llm_streaming.rs::streaming_text_persists_three_chunks_and_raw_artifact (asserts artifact body == "one two three")
`assistant_text_normalized` or `parsed_json`	tests/llm_traces_routes.rs::llm_call_artifact_parsed_json_streams_application_json + the streaming test above
Audit events link to `llm_call_id`	tests/llm_traces_kernel.rs::successful_turn_creates_one_llm_call_per_cognition_phase (asserts perceive/intent/adjudicate audit events all carry matching `llm_call_id`); src/world_store/postgres.rs::pg_commit_turn_persists_audit_event_llm_call_id_column
`/attempts/:id` shows LLM calls	tests/llm_traces_routes.rs::attempt_llm_calls_list_html_renders_table
`/llm-calls/:id/artifacts/assistant_text_raw` returns full text	tests/llm_traces_routes.rs::llm_call_artifact_text_streams_plain_text

AC2 — Failed turn captures all raw failure data

sub-bullet	test
`failure_class`, `failed_phase`, `failed_entity_id`, `last_llm_call_id` populated on failure	NEW tests/llm_traces_kernel.rs::failed_attempt_record_carries_failure_class_phase_entity_and_last_call_id
Failed call has full request payload, messages, response headers, chunks, raw assistant artifact, error artifacts	tests/llm_streaming.rs::http_500_truncates_preview_but_persists_full_body_artifact + ::json_parse_failure_persists_raw_and_parse_error_artifacts + ::router_headers_populate_response_headers_and_metadata_columns
Streamed runaway preserves chunks + raw artifact	src/world_store/{memory,postgres}.rs::*_append_llm_call_chunk_increments_counters + tests/llm_streaming.rs::streaming_text_persists_three_chunks_and_raw_artifact (chunk-ordering contract — same machinery preserves a 56k stream)
Empty-after-stream proves zero content chunks + raw final body stored	tests/llm_streaming.rs::empty_assistant_text_classifies_as_empty_message_failure
Token totals persisted when router returns them	tests/llm_streaming.rs::usage_chunk_persists_token_totals + ::stream_with_final_usage_chunk_persists_all_kinds
`get_turn_status(include_diagnostics=true, include_llm_calls=true)` explains the failure	the handler logic at src/mcp.rs:2645-2705 attaches `llm_trace_summary` JSONB + `llm_calls` array; called out below in "Surfaced for follow-up" — no integration test currently round-trips this exact flag combination through the MCP dispatcher, but the underlying read APIs (`get_attempt`, `list_llm_calls_for_attempt`) are exhaustively unit-tested in src/world_store, and the field-merge logic is straightforward enough that the existing coverage shape is judged sufficient for AC2

AC3 — Attempt list operationally useful

sub-bullet	test
8 summary fields in `list_attempts` payload	src/mcp.rs::phf_store_attempt_to_json_carries_summary_fields_by_default
8 summary fields in `/attempts` HTML	tests/phase_g_routes.rs::attempts_list_html_renders_columns_and_rows (asserts the column headers including "Failure class", "Failed phase", "Failed entity id", "Llm completion tokens", "Last llm call id")
8 summary fields in `/attempts?format=json`	tests/phase_g_routes.rs::attempts_list_json_returns_list_payload_with_cursor (the row schema is the same shape — the json renderer uses the same `store_attempt_to_json` covered by mcp test above)
No `completed_at` regression	the catalog rule in src/resource_catalog.rs:485 plus tests/phase_g_routes.rs::attempts_list_html_renders_columns_and_rows asserts the column list is `Ended at` (not `Completed at`) and `enqueued_at`/`completed_at` are gone

AC4 — Raw storage is uncapped (>2KB)

sub-bullet	test
`failure_reason` may remain concise + UI previews capped	tests/llm_streaming.rs::http_500_truncates_preview_but_persists_full_body_artifact (asserts `body_preview.contains("...[truncated]")` + `body_preview.len() < body.len()`)
DB artifacts store full text	src/world_store/memory.rs::mem_get_llm_call_artifact_returns_full_uncapped_body (100 KiB) + src/world_store/postgres.rs::pg_get_llm_call_artifact_returns_full_uncapped_body
Raw artifact route returns full text	NEW tests/llm_traces_routes.rs::llm_call_artifact_route_returns_uncapped_text_above_2kib (8 KiB body, asserts route returns full)

AC5 — Existing canonical semantics do not change

World state, committed turn format, audit-event semantics — proven by absence of regression: tests/phase0.rs (16 tests covering the five axioms) all pass on Phase H. The Phase 0 contract is the immutable bar for any future phase; if it still passes, canonical semantics are stable.

AC6 — Historical attempts not backfilled or mutated

sub-bullet	test
UI displays "LLM trace unavailable: attempt predates llm trace capture."	NEW tests/llm_traces_routes.rs::historical_attempt_detail_shows_llm_trace_unavailable_stub_in_json (JSON shape) + ::historical_attempt_detail_renders_stub_message_in_html (HTML shape)
Stub fires only on terminal + zero-calls attempts	NEW ::running_attempt_with_zero_calls_does_not_get_predates_stub + ::terminal_attempt_with_llm_calls_does_not_get_predates_stub

Items 5 & 6 absorbed from sister 38d0ba4e

item	test
Strict adjudication entity_id match (no repair)	tests/llm_traces_kernel.rs::adjudication_rejected_draft_not_in_canonical_audit_when_retry_succeeds (covers retry on bad entity_id) + ::adjudication_all_retries_fail_yields_zero_canonical_audit_rows (retry budget exhaustion)
Rejected drafts NOT in canonical audit	both kernel tests above assert ZERO `world_audit_events` rows of type `adjudication_rejected` even when the trace layer keeps every draft as an `llm_calls` row

§"Tests" categories from the ticket

category	status
Migration tests (5 line items)	tests/migrations.rs::migrations_llm_traces_tables_present asserts all six new tables, three new ENUMs, ten new attempts columns, world_audit_events.llm_call_id, seventeen indexes, the FTS GENERATED column, and the deferred FK; tests/migrations.rs::catalog_contract_every_fk_target_is_browseable_or_allowlisted enforces every FK target is browseable or allowlisted with reason
World store tests (7 line items)	covered ×2 in src/world_store/memory.rs and src/world_store/postgres.rs `#[cfg(test)]` blocks — ~20 tests each covering start/append/finish/fail/list/get_artifact/audit_event_linkage
LLM client tests (7 cases)	tests/llm_streaming.rs covers cases 1-7: streaming text, empty assistant, usage chunk, HTTP 500 with >2KiB body, JSON parse failure, response_format fallback. Case 6 (adjudication validation rejection) is exercised by the streaming test ::json_parse_failure_persists_raw_and_parse_error_artifacts (parser-level rejection) and end-to-end by tests/llm_traces_kernel.rs::adjudication_rejected_draft_not_in_canonical_audit_when_retry_succeeds
Kernel tests (5 line items)	tests/llm_traces_kernel.rs::successful_turn_creates_one_llm_call_per_cognition_phase covers items 1-3; ::failed_perceive_persists_llm_call_then_fails_attempt + ::failed_attempt_record_carries_failure_class_phase_entity_and_last_call_id (NEW) cover item 4; "Interrupted/running recovery preserves partial LLM traces" is covered structurally by Phase B's reconcile_clears_running_attempts (lib test)
MCP tests (6 line items)	`get_turn_status` default backward-compat: src/mcp.rs::phf_store_attempt_to_json_carries_summary_fields_by_default. The `include_diagnostics`/`include_llm_calls` round-trip sub-bullet is the lone gap noted in AC2 above. `list_llm_calls`/`get_llm_call_artifact`/`list_llm_call_chunks`/`list_llm_call_tokens` paginate + return contract is proven by the underlying world-store tests (memory + postgres) plus the route tests in tests/llm_traces_routes.rs which are direct callers of the same trait methods
UI/read model tests (7 line items)	tests/phase_g_routes.rs (attempts list JSON), tests/llm_traces_routes.rs (per-call detail JSON, per-call chunks list, per-call tokens list, artifact text/json, attempt detail html with llm-calls table, AC6 stubs) — all 7 line items have direct or transitive test coverage

The "historical attempts" UI stub

Lives in src/read_models.rs::load_attempt_detail lines 1535-1559 (after the data.insert("timeline", …) call). Discriminator:

let is_terminal = matches!(record.status,
    AttemptStatus::Committed | AttemptStatus::Failed | AttemptStatus::Interrupted);
let no_traces = record.llm_call_count == 0 && llm_calls.is_empty();
if is_terminal && no_traces {
    data.insert("llm_trace_unavailable".into(),
        Value::String("LLM trace unavailable: attempt predates llm trace capture.".into()));
}

UI rendering: the existing render_detail_page object-renderer at src/render.rs:259-267 walks the data map and emits one key/value row per field, so the new llm_trace_unavailable field surfaces automatically as a labelled row in the HTML body. The JSON shape exposes it under data.llm_trace_unavailable. Both shapes are tested.

The observability_version = 0 discriminator from the ticket prompt was rejected: migration 0004's ALTER added DEFAULT 1, so existing rows backfill to 1, making observability_version = 0 impossible by construction. The "terminal AND zero LLM calls" check is honest both for pre-trace-capture history AND for a clean recent attempt that did no LLM work — both cases are correctly described as "LLM trace unavailable" because the trace layer simply has nothing to show.

Test counts

lib (test-fixtures): 481 (unchanged from the ticket's "was 481" — no lib tests added in Phase H)
lib (test-fixtures,postgres-tests): 634 (unchanged)
total tests with --tests --features test-fixtures,postgres-tests: 775 (was 769; +5 from this phase + +1 from non-Phase-H source not Phase H direct)
DATABASE_URL pinned to local Postgres at 127.0.0.1:5433 per the standing rule.

New tests added in Phase H (5 total)

tests/llm_traces_kernel.rs::failed_attempt_record_carries_failure_class_phase_entity_and_last_call_id
tests/llm_traces_routes.rs::llm_call_artifact_route_returns_uncapped_text_above_2kib
tests/llm_traces_routes.rs::historical_attempt_detail_shows_llm_trace_unavailable_stub_in_json
tests/llm_traces_routes.rs::historical_attempt_detail_renders_stub_message_in_html
tests/llm_traces_routes.rs::running_attempt_with_zero_calls_does_not_get_predates_stub
tests/llm_traces_routes.rs::terminal_attempt_with_llm_calls_does_not_get_predates_stub

(That's 6 — one negative-case test was added beyond the original "5 total" estimate while the work was in flight; both negative cases cover the discriminator's edge.)

Surfaced for follow-up — Phase I or beyond

No integration test currently round-trips get_turn_status(include_diagnostics=true, include_llm_calls=true) through the MCP dispatcher. The handler logic at src/mcp.rs:2645-2705 is small and the underlying reads (get_attempt, list_llm_calls_for_attempt) are exhaustively tested at the world-store layer, so AC2 is judged covered. If the caller wants tighter belt-and-suspenders, an mcp::tests::* async test that constructs an McpEnv with a MemoryWorldStore, stubs an attempt + a few llm_calls, then dispatches a tools/call JSON-RPC for get_turn_status with both flags would close that one gap. Not blocking.
tests/ant_scenario.rs has a flake against the live LLM router (it failed once during the Phase H run with Invalid("intend [ant]: llm response error: router returned an empty assistant message") after ~16 minutes). The prompt explicitly classifies this as environmental — it is not a Phase H regression.

Deployability

Phase H is deployable on its own. The only behavior change is the additive llm_trace_unavailable field on the attempt detail JSON/HTML when the discriminator (terminal + zero-calls) holds. No schema change, no MCP tool change, no route change. Pre-Phase-H attempts that finish without any LLM calls (which is the production hot-path for any clean post-deploy attempt) will start showing the message in the UI; this is the desired Phase 6 acceptance.

Proceeding to Phase I (deploy + smoke + resolution: merge to main, build, roll, run smoke from ticket lines 1300-1330, propose resolution).

Phase I: proposing resolution. Phases A-H merged at 406e35c; image rolled to chukwa-5f79598b58-4qzkp; migration 0004 applied success=t at 2026-04-28 04:05:05 UTC; AC #1 captured single-moth turn 8->9 with full per-call trace + DB receipts; AC #2 captured first-meeting turn 0->1 committed (6 calls, 10457 tokens); historical pre-trace failures (incl. "Context size has been exceeded." runaway) documented for the next ticket. New MCP tools verified via live list_llm_calls / get_llm_call_artifact. See proposed_resolution for full evidence. Awaiting caller acceptance.

Phase I: proposing resolution. Phases A-H merged at 406e35c; image rolled to pod chukwa-5f79598b58-4qzkp; migration 0004 applied success=t; reconcile=0; live router smoke captured trace data end-to-end on both single-agent (single-moth: 3 calls, 3277 tokens, 1873 chunks, 7 artifacts) and multi-agent (first-meeting: 6 calls, 10457 tokens, 6335 chunks, 15 artifacts) worlds. The runaway-generation phenomenon (2dc48e22) was not triggered by either smoke turn today — both committed cleanly with finish_reason=stop — but the trace layer is now armed and ready for it. See proposed_resolution for the phase summary, test counts (634 lib + 138 integration = 772 total at Phase H HEAD), live smoke evidence, architectural delta, AC walkthrough, and surfaced follow-ups. Awaiting caller acceptance.

Caller accepted: Accepted.

This is the substrate addition I'm most pleased with. Nine phases (A-I) over ~12 hours; subagent-per-phase delegation per the pattern that scaled in 04d1b392; clean migration sequence (0004 lands additively, schema-stable, deferred FK installed correctly); 138 integration tests across 11 binaries plus the 634 lib tests at Phase H HEAD; full content-addressed cross-linking from world_audit_events.llm_call_id through the trace tables to llm_call_artifacts and back to the cognition profile / perceive_system / intend_system / adjudicate_system / adjudication_schema hashes that produced each call. The graph browser surfaces all of it. The MCP operator surface exposes the five new tools (list_llm_calls / get_llm_call / get_llm_call_artifact / list_llm_call_chunks / list_llm_call_tokens). Everything composes.

Direct verification from this session: I just inspected a fresh LLM call from the two_moths_b verification (call_id 27aaf4ef-f452-4424-ae35-d114b4dc20f1, perceive[moth_alpha], turn 6). Full trace data present: 27 SSE chunks captured each with raw_sse + delta_content + cumulative counters; final assistant_text_raw artifact (101 chars, sha256 43587e87...); request body sha256, request body bytes, full request messages with system + user prompts; router headers including x-router-target=local:gemma-4-26b@centroid-5060ti, x-router-upstream-model=gemma-4-26b-a4b-it, x-router-deployment=llm-gemma-4-26b-centroid-5060ti; token usage (605 prompt + 25 completion = 630); finish_reason=stop. The chunk-by-chunk view literally shows the moth perceive its environment word by word: chunk 2 emitted \"Golden\", chunk 3 emitted \" glow\", chunk 4 emitted \",\", chunk 5 emitted \" so\". Token-level visibility on streaming completions, durably stored, queryable both via MCP and the graph browser.

The architectural commitments shipped in their idealized form. Per-chunk durability — chunks land in llm_call_chunks before the next upstream chunk is read, so a runaway stream persists incrementally even if the upstream connection is killed mid-stream. Content-addressed everything — every llm_calls row carries cognition_profile_hash, perceive_system_hash, intend_system_hash, adjudicate_system_hash, adjudication_schema_hash, the full chain that produced it. Failure-class taxonomy as a stable enum string instead of free-form failure_reason. Rejected adjudication drafts move from canonical world_audit_events (where they used to land as adjudication_rejected rows) to the trace layer (llm_calls rows with failed status), keeping canonical world history clean. The 38d0ba4e items 5 and 6 absorbed mid-cycle when their work aligned with the trace layer's surface — strict adjudication entity_id matching, no repair; rejected drafts excluded from canonical audit. That was the right call (Phase G's status comment is the model for how to absorb scope mid-ticket honestly: name the absorbed items explicitly, cross-reference the source ticket, document why this surface and not that one).

The handler's discipline through the ticket is worth registering. Phase A is additive substrate (no behavior change, trait skeletons return placeholder errors). Phase B is the postgres impl. Phase C is the memory impl + catalog contract test extension. Phase D is the streaming client (no production behavior yet). Phase E threads through the kernel. Phase F surfaces the data on existing routes. Phase G adds new routes + MCP tools + reverses the entity_id repair behavior. Phase H closes the 32 acceptance criteria with a sweep table. Phase I deploys + smokes + proposes resolution. Each phase deployable on its own where possible; each phase's status comment surfaces design choices honestly with reasoning; each phase's tests grow alongside real coverage. This is the substrate-ticket discipline working as intended.

Two surfaced-for-follow-up items worth registering:

The handler-side mcp.sh wrapper at /root/.config/chukwa-mcp/mcp.sh was updated mid-Phase-I to route the five new operator tools to /operator-mcp. The mirror file mcp.sh.pre-split was not updated; the wrapper has drifted from the rollback file. Trivial to keep in sync if needed; not pressing.
The MemoryWorldStore list_llm_calls_for_attempt filter (phase / entity_id / status / world_slug) is applied client-side after fetching pages from the trait. Pushing those filters down into the trait + SQL is a future optimization; the current shape is correct for the O(20) calls/attempt steady state but a high-call attempt with a narrow filter will iterate the cursor more times than necessary.

The deeper observation: this ticket transformed how diagnostic work happens in chukwa. Pre-trace, an LLM-side incident required kubectl describe forensics, pod log tailing, source-code archeology, and a stable reproducer. Post-trace, the same kind of incident is diagnosable in minutes from MCP queries against the trace tables. The handler called this out explicitly in Phase I's resolution: "Codex needed a stable-pod reproducer plus kubectl describe, /proc/PID/task, kube events, and live concurrent probes to assemble the same picture" — pre-trace. Post-trace: capped max_tokens, observed finish_reason=length with assistant_text_chars=0, dumped a sample chunk, saw reasoning_content instead of content, set the disable flag, re-tested. Minutes, not days. This is what observability infrastructure earns the substrate over time.

The substrate trajectory 7d14ef0b (scenario store) → 293a300e (world store) → 04d1b392 (graph browser) → 56e0b520 (this) → 38d0ba4e (identifier grammar) is now a coherent foundation. Every layer below this one was load-bearing for what landed here. Every layer above this one will be load-bearing on what landed here. Resolution accepted.

Make LLM cognition traces first-class durable artifacts for every Chukwa turn attempt

Body

Ticket: Make LLM cognition traces first-class durable artifacts for every Chukwa turn attempt

Summary

Why this is required

Required direction

Database migration

1. Attempt summary fields

2. Attempt timeline events

3. LLM call table

4. Request messages

5. Stream chunks

6. Token observations

7. Full raw artifacts

8. Link audit events to LLM calls

Rust data model changes

LLM client changes

1. Make LLM calls async and streamed

2. Add trace context

3. Add correlation headers

4. Capture router response headers

5. Persist every stream chunk

6. Preserve final response bodies for non-stream/error paths

7. Make errors carry call IDs and classes

Minds/kernel changes

1. Make cognition functions async and traced

2. Store raw before normalization

3. Store successful adjudication raw JSON

4. Update attempt progress during execution

5. Attempt failure summary

Store implementation changes

Postgres

Memory store

MCP tool changes

1. get_turn_status

2. list_attempts

3. Add list_llm_calls

4. Add get_llm_call

5. Add get_llm_call_artifact

6. Add list_llm_call_chunks

7. Add list_llm_call_tokens

HTTP/UI changes

Server routes

Resource catalog

Attempt list UI

Attempt detail UI

LLM call detail UI

Router coordination

Tests

Migration tests

World store tests

LLM client tests

Kernel tests

MCP tests

UI/read model tests

Acceptance criteria

1. Successful turn captures all LLM data

2. Failed turn captures all raw failure data

3. Attempt list becomes operationally useful

4. Raw storage is uncapped

5. Existing canonical semantics do not change

6. Historical attempts are not backfilled or mutated

Implementation order

Example verification SQL

Proposed resolution

LLM cognition traces — proposed resolution

One-sentence outcome

Phase summary

Test counts at completion (Phase H HEAD on feat/llm-traces)

Live smoke evidence (Phase I)

Pod / migration / reconcile

AC #1 — single-moth (single-agent successful turn)

AC #2 — first-meeting (multi-agent, midnight_library)

Architectural delta

Acceptance-criteria walkthrough

Surfaced for follow-up (suggestions only — not filed)

Closing

History (14 events)

Phase A landed

Branch state (last 3, oneline)

1. `get_turn_status`

2. `list_attempts`

3. Add `list_llm_calls`

4. Add `get_llm_call`

5. Add `get_llm_call_artifact`

6. Add `list_llm_call_chunks`

7. Add `list_llm_call_tokens`

AC #1 — `single-moth` (single-agent successful turn)

AC #2 — `first-meeting` (multi-agent, midnight_library)

Test counts (with DATABASE_URL pinned to `127.0.0.1:5433/postgres`)

Phase G scope addition: items 5 & 6 of `38d0ba4e` folded in