The key design decision: keep one attempt per attempted turn. That preserves the current lifecycle model, LLM trace visibility, failure records, list_attempts, get_turn_status, and canonical turn history. Add a higher-level “turn run” / “series” wrapper that sequences those attempts.

The relevant pieces are:

src/mcp.rs
- handle_run_turn currently calls world_store.start_attempt(...), spawns one Runtime::run_claimed(...), and returns one attempt_id.
- Tool schema/manifest is built in consumer_tool_contract.
- Unknown-key validation currently only allows world_slug for run_turn.
src/kernel.rs
- Runtime::run_turn and Runtime::run_claimed are strictly one-attempt / one-turn paths.
- This should remain true.
src/world_store/mod.rs
- WorldStore trait has start_attempt, commit_turn, fail_attempt, reconcile_running_attempts, get_attempt_status, and list_attempts.
src/world_store/postgres.rs
- start_attempt enforces the world lease with worlds.active_attempt_id.
- commit_turn and fail_attempt clear the attempt lease.
src/world_store/memory.rs
- In-memory mirror of the same lifecycle.
migrations/0002_world_store.sql
- attempts has the invariant attempted_turn = turn_before + 1.
- There is a partial unique index allowing only one running attempt per world.
- worlds.active_attempt_id is the current lease.
src/mcp/tests.rs
- Tool manifest contracts and examples must be updated.

Ticket: Add multi-turn `run_turn` support with durable turn-run lifecycle

Goal

Extend the MCP run_turn tool so callers can request more than one committed turn in a single tool call by passing an optional integer parameter named turn_count.

The implementation must preserve the existing per-turn attempt lifecycle. A multi-turn run must create and execute one normal attempt per attempted turn. The system must expose durable visibility into the multi-turn run, must allow cancellation between attempts, and must keep every individual attempt observable through the existing attempt lifecycle tools.

Non-goals

Do not change cognition workflow semantics.

Do not change WorldPatch.

Do not change how a single turn is committed or failed.

Do not create a single attempt that commits multiple turns.

Do not remove or weaken the existing attempts lifecycle.

Do not make multi-turn execution parallel. Multi-turn runs are strictly sequential per world.

Required public tool behavior

`run_turn`

Update the existing run_turn MCP tool to accept these arguments:

{
  "world_slug": "string",
  "turn_count": "optional integer",
  "max_attempts": "optional integer"
}

turn_count rules:

Name: turn_count
Type: integer
Minimum: 1
Maximum: 100000
Default: 1
Meaning: the number of committed turns the caller wants the world to advance.

max_attempts rules:

Name: max_attempts
Type: integer
Minimum: 1
Maximum: 1000000
Default: equal to turn_count
Must be greater than or equal to turn_count
Meaning: the maximum number of per-turn attempts the turn run may start before it fails.

The tool must reject unknown keys. The accepted keys for run_turn are exactly:

world_slug
turn_count
max_attempts

Single-attempt compatibility mode

When turn_count is omitted and max_attempts is omitted, preserve the current behavior:

Create one attempt with WorldStore::start_attempt.
Spawn one Runtime::run_claimed.
Return the existing attempt_id.
Do not create a turn-run row.

The response must add these fields:

{
  "run_mode": "single_attempt",
  "turn_count": 1,
  "turn_count_source": "default",
  "turn_count_hint": "No turn_count was supplied; run_turn defaulted to turn_count=1 and started one single-turn attempt.",
  "max_attempts": 1,
  "max_attempts_source": "default",
  "max_attempts_hint": "No max_attempts was supplied; max_attempts defaulted to turn_count (1)."
}

The existing fields must remain:

{
  "world_slug": "...",
  "attempt_id": "...",
  "status": "running",
  "turn_before": 123,
  "attempted_turn": 124,
  "poll_with": {
    "tool": "get_turn_status",
    "args": {
      "world_slug": "...",
      "attempt_id": "..."
    }
  }
}

Explicit single-turn mode

When turn_count is explicitly supplied as 1 and max_attempts is omitted or explicitly 1, use the same single-attempt path.

Set:

{
  "run_mode": "single_attempt",
  "turn_count": 1,
  "turn_count_source": "explicit",
  "turn_count_hint": "turn_count was supplied as 1; run_turn started one single-turn attempt."
}

Multi-turn mode

When turn_count > 1, or when max_attempts > 1, run_turn must create a durable turn-run row and spawn a background coordinator task.

The response must include:

{
  "run_mode": "turn_run",
  "world_slug": "...",
  "turn_run_id": "...",
  "status": "running",
  "turn_count": 40,
  "turn_count_source": "explicit",
  "turn_count_hint": "turn_count was supplied as 40; run_turn started a turn run targeting 40 committed turn(s).",
  "max_attempts": 40,
  "max_attempts_source": "default",
  "max_attempts_hint": "No max_attempts was supplied; max_attempts defaulted to turn_count (40).",
  "start_turn": 123,
  "target_turn": 163,
  "poll_with": {
    "tool": "get_turn_run_status",
    "args": {
      "world_slug": "...",
      "turn_run_id": "..."
    }
  },
  "list_attempts_with": {
    "tool": "list_attempts",
    "args": {
      "world_slug": "...",
      "turn_run_id": "..."
    }
  }
}

For explicit max_attempts, the hint must be:

max_attempts was supplied as {max_attempts}; the turn run will stop after at most {max_attempts} attempt(s).

Attempt semantics

A turn run must not pre-create attempts.

A turn run must create attempts lazily, one at a time.

Each attempt must still represent exactly one attempted turn.

A successful attempt increments world current_turn by exactly one.

A failed attempt does not increment world current_turn.

The turn run must continue creating attempts until one of these terminal conditions is reached:

committed_turn_count == requested_turn_count
- Turn run status becomes completed.
attempt_count == max_attempts and committed_turn_count < requested_turn_count
- Turn run status becomes failed.
- Failure reason must be exactly:
```
max_attempts exhausted before requested turn_count committed
```
Cancellation was requested and the current active attempt has finished.
- Turn run status becomes cancelled.
The process restarts before the turn run completes.
- Startup reconciliation marks the turn run interrupted.

A failed individual attempt must remain visible as a normal failed attempt through get_turn_status and list_attempts.

A failed individual attempt must not automatically fail the whole turn run unless max_attempts has been exhausted.

New public tools

Add two consumer MCP tools.

`get_turn_run_status`

Input:

{
  "world_slug": "string",
  "turn_run_id": "uuid",
  "include_attempts": "optional boolean",
  "attempt_limit": "optional integer"
}

Rules:

include_attempts default: false
attempt_limit default: 20
attempt_limit min: 1
attempt_limit max: 100

Return:

{
  "message": "...",
  "world_slug": "...",
  "turn_run_id": "...",
  "status": "running",
  "requested_turn_count": 1000,
  "max_attempts": 3000,
  "start_turn": 20,
  "target_turn": 1020,
  "current_turn": 25,
  "committed_turn_count": 5,
  "remaining_committed_turns": 995,
  "attempt_count": 3000,
  "failed_attempt_count": 2995,
  "interrupted_attempt_count": 0,
  "active_attempt_id": null,
  "last_attempt_id": "...",
  "last_attempt_status": "failed",
  "progress": "...",
  "cancel_requested_at": null,
  "cancel_reason": null,
  "enqueued_at": "...",
  "started_at": "...",
  "ended_at": null,
  "poll_active_attempt_with": null,
  "list_attempts_with": {
    "tool": "list_attempts",
    "args": {
      "world_slug": "...",
      "turn_run_id": "..."
    }
  }
}

When include_attempts is true, include:

{
  "recent_attempts": [
    {
      "attempt_id": "...",
      "turn_run_id": "...",
      "turn_run_seq": 1,
      "status": "committed",
      "turn_before": 20,
      "attempted_turn": 21,
      "produced_turn": 21
    }
  ]
}

The recent attempts must be ordered newest first.

`cancel_turn_run`

Input:

{
  "world_slug": "string",
  "turn_run_id": "uuid",
  "reason": "optional string"
}

Behavior:

If the turn run is running, set status to cancel_requested.
If there is no active attempt, immediately mark the turn run cancelled, set ended_at, and clear worlds.active_turn_run_id.
If there is an active attempt, do not interrupt it. The active attempt must finish through the normal attempt lifecycle. The coordinator must stop before starting the next attempt, mark the turn run cancelled, set ended_at, and clear worlds.active_turn_run_id.
If the turn run is already terminal, do not mutate it. Return the current status and a message saying it was already terminal.
If reason is omitted, store:
```
cancellation requested by caller
```

Return the same status shape as get_turn_run_status.

Existing public tool changes

`get_turn_status`

Augment the existing attempt response with:

{
  "turn_run_id": null,
  "turn_run_seq": null
}

For attempts that belong to a turn run, these fields must be populated.

`list_attempts`

Accept optional turn_run_id.

Current accepted keys become:

world_slug
turn_run_id

When turn_run_id is provided, list only attempts belonging to that turn run.

Every attempt summary must include:

{
  "turn_run_id": null,
  "turn_run_seq": null
}

Database migration

Add migration:

migrations/0008_turn_runs.sql

Create enum:

CREATE TYPE turn_run_status AS ENUM (
    'running',
    'cancel_requested',
    'completed',
    'failed',
    'cancelled',
    'interrupted'
);

Alter worlds:

ALTER TABLE worlds
    ADD COLUMN active_turn_run_id UUID;

Create table:

CREATE TABLE turn_runs (
    turn_run_id UUID PRIMARY KEY,
    world_slug label_text NOT NULL REFERENCES worlds(slug),

    status turn_run_status NOT NULL,

    enqueued_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    started_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    ended_at TIMESTAMPTZ,

    worker_id TEXT NOT NULL,

    start_turn BIGINT NOT NULL CHECK (start_turn >= 0),
    target_turn BIGINT NOT NULL CHECK (target_turn >= 1),

    requested_turn_count INT NOT NULL CHECK (requested_turn_count >= 1),
    max_attempts INT NOT NULL CHECK (max_attempts >= 1),

    turn_count_source TEXT NOT NULL CHECK (turn_count_source IN ('default', 'explicit')),
    max_attempts_source TEXT NOT NULL CHECK (max_attempts_source IN ('default', 'explicit')),

    committed_turn_count INT NOT NULL DEFAULT 0 CHECK (committed_turn_count >= 0),
    attempt_count INT NOT NULL DEFAULT 0 CHECK (attempt_count >= 0),
    failed_attempt_count INT NOT NULL DEFAULT 0 CHECK (failed_attempt_count >= 0),
    interrupted_attempt_count INT NOT NULL DEFAULT 0 CHECK (interrupted_attempt_count >= 0),

    active_attempt_id UUID,
    last_attempt_id UUID,

    progress TEXT,
    cancel_requested_at TIMESTAMPTZ,
    cancel_reason TEXT,
    failure_reason TEXT,

    CONSTRAINT turn_runs_target_check
        CHECK (target_turn = start_turn + requested_turn_count),

    CONSTRAINT turn_runs_max_attempts_check
        CHECK (max_attempts >= requested_turn_count),

    CONSTRAINT turn_runs_terminal_time_check
        CHECK (
            (status IN ('running', 'cancel_requested') AND ended_at IS NULL)
            OR
            (status IN ('completed', 'failed', 'cancelled', 'interrupted') AND ended_at IS NOT NULL)
        ),

    CONSTRAINT turn_runs_cancel_check
        CHECK (
            (status IN ('cancel_requested', 'cancelled') AND cancel_requested_at IS NOT NULL)
            OR
            (status NOT IN ('cancel_requested', 'cancelled'))
        ),

    CONSTRAINT turn_runs_failure_check
        CHECK (
            (status IN ('failed', 'interrupted') AND failure_reason IS NOT NULL)
            OR
            (status NOT IN ('failed', 'interrupted'))
        ),

    CONSTRAINT turn_runs_world_run_unique UNIQUE (world_slug, turn_run_id)
);

Indexes:

CREATE INDEX turn_runs_world_enqueued_idx
    ON turn_runs(world_slug, enqueued_at DESC);

CREATE INDEX turn_runs_world_status_idx
    ON turn_runs(world_slug, status);

CREATE UNIQUE INDEX turn_runs_one_active_per_world_idx
    ON turn_runs(world_slug)
    WHERE status IN ('running', 'cancel_requested');

Alter attempts:

ALTER TABLE attempts
    ADD COLUMN turn_run_id UUID,
    ADD COLUMN turn_run_seq INT;

ALTER TABLE attempts
    ADD CONSTRAINT attempts_turn_run_pair_check
    CHECK (
        (turn_run_id IS NULL AND turn_run_seq IS NULL)
        OR
        (turn_run_id IS NOT NULL AND turn_run_seq IS NOT NULL AND turn_run_seq >= 1)
    );

ALTER TABLE attempts
    ADD CONSTRAINT attempts_turn_run_fk
    FOREIGN KEY (world_slug, turn_run_id)
    REFERENCES turn_runs(world_slug, turn_run_id);

CREATE UNIQUE INDEX attempts_turn_run_seq_idx
    ON attempts(turn_run_id, turn_run_seq)
    WHERE turn_run_id IS NOT NULL;

CREATE INDEX attempts_turn_run_idx
    ON attempts(turn_run_id, turn_run_seq);

Add deferred FKs after both tables exist:

ALTER TABLE turn_runs
    ADD CONSTRAINT turn_runs_active_attempt_fk
    FOREIGN KEY (world_slug, active_attempt_id)
    REFERENCES attempts(world_slug, attempt_id)
    DEFERRABLE INITIALLY DEFERRED;

ALTER TABLE turn_runs
    ADD CONSTRAINT turn_runs_last_attempt_fk
    FOREIGN KEY (world_slug, last_attempt_id)
    REFERENCES attempts(world_slug, attempt_id)
    DEFERRABLE INITIALLY DEFERRED;

ALTER TABLE worlds
    ADD CONSTRAINT worlds_active_turn_run_fk
    FOREIGN KEY (slug, active_turn_run_id)
    REFERENCES turn_runs(world_slug, turn_run_id)
    DEFERRABLE INITIALLY DEFERRED;

Store-layer changes

Add these public DTOs to src/world_store/mod.rs:

pub struct TurnRunId(pub Uuid);

pub enum TurnRunStatus {
    Running,
    CancelRequested,
    Completed,
    Failed,
    Cancelled,
    Interrupted,
}

pub struct StartTurnRunInput {
    pub world_slug: Slug,
    pub worker_id: String,
    pub requested_turn_count: u32,
    pub max_attempts: u32,
    pub turn_count_source: String,
    pub max_attempts_source: String,
}

pub struct TurnRunRecord {
    pub turn_run_id: TurnRunId,
    pub world_slug: Slug,
    pub status: TurnRunStatus,
    pub requested_turn_count: u32,
    pub max_attempts: u32,
    pub start_turn: u64,
    pub target_turn: u64,
    pub committed_turn_count: u32,
    pub attempt_count: u32,
    pub failed_attempt_count: u32,
    pub interrupted_attempt_count: u32,
    pub active_attempt_id: Option<AttemptId>,
    pub last_attempt_id: Option<AttemptId>,
    pub progress: Option<String>,
    pub cancel_requested_at: Option<DateTime<Utc>>,
    pub cancel_reason: Option<String>,
    pub failure_reason: Option<String>,
    pub enqueued_at: DateTime<Utc>,
    pub started_at: DateTime<Utc>,
    pub ended_at: Option<DateTime<Utc>>,
}

Add these methods to WorldStore:

async fn start_turn_run(
    &self,
    input: StartTurnRunInput,
) -> Result<TurnRunRecord, WorldStoreError>;

async fn start_attempt_for_turn_run(
    &self,
    turn_run_id: TurnRunId,
    worker_id: &str,
) -> Result<ClaimedAttempt, WorldStoreError>;

async fn finish_turn_run_attempt(
    &self,
    turn_run_id: TurnRunId,
    attempt_id: AttemptId,
) -> Result<TurnRunRecord, WorldStoreError>;

async fn get_turn_run_status(
    &self,
    turn_run_id: TurnRunId,
) -> Result<TurnRunRecord, WorldStoreError>;

async fn request_turn_run_cancel(
    &self,
    turn_run_id: TurnRunId,
    reason: String,
) -> Result<TurnRunRecord, WorldStoreError>;

async fn finalize_cancelled_turn_run_if_idle(
    &self,
    turn_run_id: TurnRunId,
) -> Result<TurnRunRecord, WorldStoreError>;

The exact method names above must be used.

`start_turn_run`

This method must:

Lock the world row.
Reject deleted or unknown worlds using existing errors.
Reject if active_attempt_id is not null.
Reject if active_turn_run_id is not null.
Insert a turn_runs row with status running.
Set worlds.active_turn_run_id.
Return the TurnRunRecord.

Existing `start_attempt`

Update existing start_attempt so it rejects when worlds.active_turn_run_id is not null.

The error must be WorldStoreError::Busy.

The returned busy attempt id string should be the active_turn_run_id string when no active attempt exists.

`start_attempt_for_turn_run`

This method must:

Lock the turn run row.
Lock the world row.
Verify the turn run status is running.
Verify worlds.active_turn_run_id == turn_run_id.
Verify worlds.active_attempt_id IS NULL.
Verify attempt_count < max_attempts.
Load the current turn state and scenario exactly like start_attempt.
Insert a normal attempts row.
Populate attempts.turn_run_id.
Populate attempts.turn_run_seq = turn_runs.attempt_count + 1.
Set worlds.active_attempt_id.
Set turn_runs.active_attempt_id.
Set turn_runs.last_attempt_id.
Increment turn_runs.attempt_count.
Return ClaimedAttempt.

`finish_turn_run_attempt`

This method must:

Lock the turn run row.
Lock the attempt row.
Verify the attempt belongs to the turn run.
Verify the attempt is terminal: committed, failed, or interrupted.
Clear turn_runs.active_attempt_id if it equals the attempt id.
Increment counters:
- committed_turn_count += 1 if attempt status is committed
- failed_attempt_count += 1 if attempt status is failed
- interrupted_attempt_count += 1 if attempt status is interrupted
If committed_turn_count == requested_turn_count, set status completed, set ended_at, clear worlds.active_turn_run_id.
Else if status is cancel_requested, set status cancelled, set ended_at, clear worlds.active_turn_run_id.
Else if attempt_count >= max_attempts, set status failed, set ended_at, set failure reason exactly:
```
max_attempts exhausted before requested turn_count committed
```
and clear worlds.active_turn_run_id.
Else leave status running.

`request_turn_run_cancel`

This method must:

Lock the turn run row.
If status is terminal, return the row without mutation.
Set cancel_requested_at and cancel_reason.
If active_attempt_id is null:
- set status cancelled
- set ended_at
- clear worlds.active_turn_run_id
If active_attempt_id is not null:
- set status cancel_requested
- do not mutate the active attempt

`reconcile_running_attempts`

Update startup reconciliation so it also handles turn runs.

After existing running attempts are marked interrupted, mark every turn_runs row with status running or cancel_requested as interrupted.

Set failure reason exactly:

process restart before turn run completed

Set ended_at.

Clear worlds.active_turn_run_id for affected worlds.

Also clear turn_runs.active_attempt_id.

Runtime changes

Add a background coordinator in src/kernel.rs or a new module src/turn_runs.rs.

The coordinator must:

Load the turn run record.
Stop if status is terminal.
Stop if cancellation is requested.
Stop if committed count reached target.
Stop if max attempts reached.
Claim one attempt through start_attempt_for_turn_run.
Run that attempt through Runtime::run_claimed.
Regardless of success or failure, call finish_turn_run_attempt.
Loop.

The coordinator must not call Runtime::run_turn, because Runtime::run_turn uses start_attempt, and start_attempt must reject while a turn run holds the world.

The coordinator must call Runtime::run_claimed with the ClaimedAttempt returned by start_attempt_for_turn_run.

When Runtime::run_claimed returns an error caused by a normal failed attempt, the coordinator must still call finish_turn_run_attempt and continue if the turn run is still eligible to continue.

When the coordinator itself encounters a store error that prevents progress, mark the turn run failed, set failure_reason to the error string, set ended_at, and clear worlds.active_turn_run_id.

MCP implementation changes

Tool registry

Add these tools to CONSUMER_TOOLS and ALL_TOOLS:

get_turn_run_status
cancel_turn_run

Keep the tool partition test passing.

Tool schemas

Update consumer_tool_contract:

run_turn schema includes turn_count and max_attempts.
get_turn_run_status schema added.
cancel_turn_run schema added.
list_attempts schema accepts optional turn_run_id.

Update validate_tool_args / unknown-key rejection accordingly.

`handle_run_turn`

Implement this exact branch structure:

Parse world_slug.
Parse turn_count, default 1.
Parse max_attempts, default turn_count.
Validate:
- turn_count >= 1
- turn_count <= 100000
- max_attempts >= 1
- max_attempts <= 1000000
- max_attempts >= turn_count
If turn_count == 1 and max_attempts == 1, run the existing single-attempt path.
Otherwise:
- call world_store.start_turn_run
- spawn the turn-run coordinator task
- return the multi-turn response shape described above

`handle_get_turn_run_status`

Return the status shape described above.

When active_attempt_id is present, include:

{
  "poll_active_attempt_with": {
    "tool": "get_turn_status",
    "args": {
      "world_slug": "...",
      "attempt_id": "..."
    }
  }
}

When active_attempt_id is null, poll_active_attempt_with must be null.

`handle_cancel_turn_run`

Call world_store.request_turn_run_cancel.

Return the same status shape as get_turn_run_status.

`store_attempt_to_json`

Include:

{
  "turn_run_id": null,
  "turn_run_seq": null
}

Populate when present.

Tests

Add and update tests in src/mcp/tests.rs, src/world_store/memory.rs, and src/world_store/postgres.rs.

Required MCP tests

run_turn_without_turn_count_preserves_single_attempt_contract
- Call run_turn with only world_slug.
- Assert response has run_mode: "single_attempt".
- Assert response has attempt_id.
- Assert response has no turn_run_id.
- Assert turn_count_source: "default".
run_turn_with_turn_count_starts_turn_run
- Call run_turn with turn_count: 3.
- Assert response has run_mode: "turn_run".
- Assert response has turn_run_id.
- Assert response has no attempt_id.
- Poll get_turn_run_status until terminal.
- Assert final status completed.
- Assert world advanced by 3 committed turns.
run_turn_with_explicit_one_uses_single_attempt_path
- Call run_turn with turn_count: 1.
- Assert response has run_mode: "single_attempt".
- Assert turn_count_source: "explicit".
run_turn_rejects_invalid_turn_count
- turn_count: 0 rejected.
- turn_count: 100001 rejected.
- max_attempts < turn_count rejected.
- unknown keys rejected.
get_turn_run_status_reports_active_and_recent_attempts
- Start a multi-turn run.
- Poll status.
- Assert counters, active/last attempt fields, and list_attempts_with.
cancel_turn_run_stops_before_next_attempt
- Start a long run.
- Call cancel_turn_run.
- Assert status becomes cancel_requested or cancelled.
- Assert no additional attempts are started after cancellation is observed.
- Assert final status cancelled.
list_attempts_filters_by_turn_run_id
- Start a turn run.
- Assert attempts returned all have that turn_run_id.

Required store tests

start_turn_run_claims_world_run_lease
- World gets active_turn_run_id.
- Single start_attempt is rejected while turn run is active.
start_attempt_for_turn_run_creates_attempt_with_sequence
- Attempts have turn_run_id.
- Attempts have sequential turn_run_seq.
finish_turn_run_attempt_completes_on_requested_commits
- After requested committed attempts, status is completed.
- worlds.active_turn_run_id cleared.
finish_turn_run_attempt_fails_on_max_attempts
- Failed attempts increment failed counter.
- Status becomes failed when max attempts exhausted.
- Failure reason matches required text.
request_turn_run_cancel_is_idempotent
- Repeated cancel calls do not corrupt counters.
reconcile_running_attempts_interrupts_turn_runs
- A running turn run is marked interrupted on reconcile.
- worlds.active_turn_run_id is cleared.

Required manifest tests

Update consumer_tool_manifest_schemas_match_handler_contracts.

Update consumer_manifest_examples_execute so the examples cover:

default single-turn run_turn
explicit multi-turn run_turn
get_turn_run_status
cancel_turn_run

Acceptance criteria

This ticket is complete when all of the following are true:

run_turn({"world_slug": "x"}) still starts exactly one attempt and returns an attempt_id.
run_turn({"world_slug": "x", "turn_count": 40}) starts one durable turn run and returns a turn_run_id.
Multi-turn execution creates one normal attempt per attempted turn.
Individual attempts remain visible through get_turn_status and list_attempts.
get_turn_run_status reports committed count, attempt count, failed count, interrupted count, active attempt, last attempt, and terminal status.
cancel_turn_run prevents the turn run from starting another attempt after the active attempt finishes.
Startup reconciliation cannot leave a world stuck with an active turn run.
Existing one-turn callers remain compatible.
The tool response includes deterministic hint fields stating whether turn_count and max_attempts were supplied or defaulted.
All Postgres, memory-store, MCP, and manifest tests pass.

Revised cleanup is implemented, committed, pushed, deployed, and live-smoked.

Compliance matrix by review item:

cancel_turn_run validates world_slug before mutation: fixed in src/mcp.rs; the handler now rejects mismatched turn_run_id/world_slug immediately after the initial status read and before parsing/storing cancellation reason.
get_turn_status validates world_slug: fixed in src/mcp.rs; wrong-world attempt lookups now return UNKNOWN_ATTEMPT before payload serialization.
World-scoped browse routes validate ownership: fixed in src/server.rs and src/read_models.rs; /w/:slug/turn-run/:id and /w/:slug/attempt/:id now pass expected world_slug through DetailRequest and reject mismatches.
finish_turn_run_attempt is idempotent: fixed in src/world_store/memory.rs and src/world_store/postgres.rs; counters only mutate when the finishing attempt is still the active attempt, and duplicate finish calls for the last attempt return the current record without double-counting.
fail_turn_run no longer orphans a running active attempt: fixed in memory and Postgres stores; the active running attempt is terminalized as interrupted before world leases are cleared, and interrupted_attempt_count is incremented.
Persisted source metadata is surfaced: TurnRunRecord in src/world_store/mod.rs now carries turn_count_source and max_attempts_source; MCP status/cancel responses and browse JSON include those fields plus deterministic hint strings.

Files changed:

src/mcp.rs: ownership validation for cancel/status, public status metadata/hints.
src/world_store/mod.rs: TurnRunRecord source fields.
src/world_store/memory.rs: idempotent finish, failure cleanup, source-field serialization, memory regressions.
src/world_store/postgres.rs: idempotent finish, failure cleanup in transaction, source-field serialization, Postgres regressions.
src/read_models.rs: source fields/hints in browse JSON and world-scoped detail mismatch checks.
src/server.rs: passes world slug into attempt/turn-run detail read models.
src/mcp/tests.rs: MCP wrong-world and source-observability regressions.
tests/phase_g_routes.rs: browse-route wrong-world regressions and turn-run source metadata assertions.

New regression tests added:

mcp::tests::get_turn_run_status_rejects_wrong_world
mcp::tests::cancel_turn_run_rejects_wrong_world_without_mutating
mcp::tests::get_turn_status_rejects_attempt_from_wrong_world
phase_g_routes::world_scoped_turn_run_detail_rejects_wrong_world
phase_g_routes::world_scoped_attempt_detail_rejects_wrong_world
world_store::memory::tests::finish_turn_run_attempt_does_not_double_count_same_attempt
world_store::postgres::tests::finish_turn_run_attempt_does_not_double_count_same_attempt
world_store::memory::tests::turn_run_failure_path_does_not_orphan_running_attempt
world_store::postgres::tests::turn_run_failure_path_does_not_orphan_running_attempt

Verification run:

git diff --check: passed with no output.
cargo test --lib --features test-fixtures: test result: ok. 640 passed; 0 failed.
cargo test --tests --features test-fixtures: passed, exit 0; integration targets passed including phase_g_routes with 18 passed; 0 failed.
cargo test --tests --features test-fixtures,postgres-tests -- --test-threads=1: passed against DATABASE_URL=postgres://postgres:postgres@127.0.0.1:5433/postgres; lib section reported 808 passed; 0 failed, migrations 7 passed; 0 failed, phase_g_routes 18 passed; 0 failed, and final command exit 0.

Deploy and artifact receipts:

Pushed main from 99f1bec to 4e5fc4b on gitlab.
Ran bash k8s/deploy.sh from /srv/chukwa; rollout succeeded.
Current pod: chukwa-84c4d5c88-ks9tw, Ready 1/1, Running, 0 restarts.
https://chukwa.benac.dev/healthz returns ok.
https://chukwa.benac.dev/chukwa-repo.zip was fetched after deploy and contains the new world_slug mismatch rejection code in repo/src/mcp.rs and the new turn_count_source fields in repo/src/world_store/mod.rs.

Live smoke against https://chukwa.benac.dev/mcp:

Scenario used: ice_empty_fish_seed.
run_turn({world_slug}) on smoke_single_0501130102 returned single attempt_id 4722f58d-9229-4072-aaac-8f52b086c22b with run_mode: single_attempt; correct-world get_turn_status later reported status: committed.
Wrong-world get_turn_status with that attempt and world_slug: smoke_cancel_b_0501130102 returned UNKNOWN_ATTEMPT.
run_turn({world_slug: smoke_multi_0501130102, turn_count: 3}) returned turn run 08120251-f21c-4e6b-a34e-f5543a985b5e; final get_turn_run_status(include_attempts=true) reported status: completed, committed_turn_count: 3, attempt_count: 3, and recent attempts newest-first with turn_run_seq 3, 2, 1.
Wrong-world cancel_turn_run against turn run 3366fa04-d2c7-44e8-8b4b-7a5dd1359e42 using world_slug: smoke_cancel_b_0501130102 returned UNKNOWN_TURN_RUN.
Immediately after that wrong-world cancel, correct-world get_turn_run_status for smoke_cancel_a_0501130102 still reported status: running, cancel_requested_at: null, and cancel_reason: null, proving no mutation occurred. That run later completed normally with committed_turn_count: 3, attempt_count: 3.

I reviewed the uploaded repo for 0cac6740-2728-4e29-acec-5047e25f23f4 against the actual implementation at 99f1bec. The feature is close and the broad architecture is right: multi-turn execution uses one normal attempt per attempted turn, Runtime::run_claimed is used rather than Runtime::run_turn, attempts are linked with turn_run_id/turn_run_seq, and the consumer surface exposes run_turn, get_turn_run_status, cancel_turn_run, and list_attempts(turn_run_id).

I am not accepting the ticket yet. Please handle the cleanup in this same ticket, not as follow-ups. The remaining issues are lifecycle/ownership correctness problems, not cosmetic polish.

1. Blocking: `cancel_turn_run` mutates before validating `world_slug` ownership

In src/mcp.rs, turn_run_status_to_json has the ownership check:

if record.world_slug.as_str() != slug.as_str() {
    return Err(McpError::new("UNKNOWN_TURN_RUN", ...));
}

That check is at src/mcp.rs:3851-3859.

But handle_cancel_turn_run does this:

let before = env.world_store.get_turn_run_status(turn_run_id).await?;
let was_terminal = before.status.is_terminal();

...

let record = if was_terminal {
    before
} else {
    env.world_store
        .request_turn_run_cancel(turn_run_id, reason)
        .await?
};

let mut payload = turn_run_status_to_json(env, &slug, record, false, 20).await?;

That is at src/mcp.rs:3976-4007.

So a caller can provide:

{
  "world_slug": "wrong_world",
  "turn_run_id": "<valid turn run from another world>"
}

and the handler can cancel the turn run before discovering the world mismatch.

This is the most important bug to fix. The public input includes world_slug; mutation must not happen until the handler proves the turn_run_id belongs to that world.

Please fix by validating immediately after the first get_turn_run_status read and before reason parsing or request_turn_run_cancel:

let before = env.world_store.get_turn_run_status(turn_run_id).await?;

if before.world_slug.as_str() != slug.as_str() {
    return Err(McpError::new(
        "UNKNOWN_TURN_RUN",
        format!(
            "turn_run_id {} does not belong to world_slug {:?}",
            turn_run_id.as_uuid(),
            slug.as_str()
        ),
    ));
}

Required regression test:

cancel_turn_run_rejects_wrong_world_without_mutating

Test shape:

Seed/create two worlds, world_a and world_b.
Start a turn run on world_a.
Call cancel_turn_run with world_slug = world_b and turn_run_id from world_a.
Assert the call errors with UNKNOWN_TURN_RUN or equivalent.
Fetch the turn run on world_a.
Assert it is not cancel_requested and not cancelled.
Finish/cancel normally to prove the run was not damaged.

This should be covered at the MCP layer, not only store-level.

2. Blocking/related: `get_turn_status` also does not validate `world_slug`

handle_get_turn_status currently parses world_slug, fetches the attempt by global UUID, and returns it without checking that the attempt belongs to the supplied world:

let slug = require_world_slug(args)?;
...
let record = env.world_store.get_attempt_status(attempt_id).await?;
let mut payload = store_attempt_to_json(&record);

This is at src/mcp.rs:4024-4083.

That means callers can query an attempt from world_a while passing world_slug = world_b, and the response will still return the attempt. The response does include the actual record.world_slug, but the request contract says world_slug identifies the world. This is the same ownership class as the cancel_turn_run bug, just non-mutating.

Please add:

if record.world_slug.as_str() != slug.as_str() {
    return Err(McpError::unknown_attempt(id_str, &slug));
}

or an equivalent UNKNOWN_ATTEMPT / UNKNOWN_TURN_STATUS response before returning the payload.

Required regression test:

get_turn_status_rejects_attempt_from_wrong_world

Test shape:

Seed/create two worlds.
Start or inject an attempt for world_a.
Call get_turn_status with world_slug = world_b, attempt_id = world_a_attempt.
Assert error.
Call with the correct world and assert success.

This matters more now because turn-run attempts are explicitly linked and exposed through status/polling helpers.

3. Blocking for browseability: world-scoped turn-run detail route ignores the world slug

The resolution says the human comment was addressed by making turn_runs browseable and adding world-scoped routes:

/w/:slug/turn-runs
/w/:slug/turn-run/:turn_run_id

But the world-scoped detail route does not validate that the turn run belongs to :slug.

In src/server.rs:

async fn turn_run_detail_world(
    Path((slug, turn_run_id)): Path<(String, String)>,
    ...
) -> Response {
    ...
    turn_run_detail_render(state, q, turn_run_id, &instance, Some(&slug)).await
}

This is src/server.rs:2472-2482.

turn_run_detail_render builds:

let req = DetailRequest::new(ResourceKind::TurnRun).with("turn_run_id", &turn_run_id);
let result = read_models::load_detail(&env, req).await;

This is src/server.rs:2404-2413.

Then load_turn_run_detail loads only by turn_run_id:

let record = env.world_store.get_turn_run_status(turn_run_id).await?;
let attempts = env.world_store.list_attempts_for_turn_run(&record.world_slug, turn_run_id).await?;

This is src/read_models.rs:1699-1724.

So /w/wrong_world/turn-run/<valid-id> can render a turn run from another world while putting the wrong world slug in page context.

Please fix the world-scoped route to validate ownership. Options:

Add expected world_slug to the DetailRequest and make load_turn_run_detail reject mismatches.
Or validate in turn_run_detail_render after loading payload/record.
Or add a world-scoped store method.

Required regression test:

world_scoped_turn_run_detail_rejects_wrong_world

Also check the same pattern for attempt detail:

/w/:slug/attempt/:attempt_id

attempt_detail_world similarly passes the slug only as render context (src/server.rs:2391-2401), while load_attempt_detail loads by attempt id only (src/read_models.rs:1800-1822). That preexisting pattern should be corrected while we are fixing ownership validation for turn-run browseability.

4. Lifecycle hardening: `finish_turn_run_attempt` is not idempotent and can double-count

Both Postgres and memory implementations increment counters every time finish_turn_run_attempt is called for a terminal attempt.

Postgres:

match attempt_status {
    AttemptStatus::Committed => run.committed_turn_count += 1,
    AttemptStatus::Failed => run.failed_attempt_count += 1,
    AttemptStatus::Interrupted => run.interrupted_attempt_count += 1,
    AttemptStatus::Running => {}
}

src/world_store/postgres.rs:1868-1873.

Memory store has the same shape at src/world_store/memory.rs:1559-1564.

There is no guard that the attempt is still the run’s active attempt, nor an “accounted” marker. A duplicate call can corrupt:

committed_turn_count
failed_attempt_count
interrupted_attempt_count
terminal status

In normal coordinator flow this may not happen, but this is a durable lifecycle subsystem. Store methods should be robust against duplicate coordinator calls, retries, or accidental re-entry.

Please make finish_turn_run_attempt idempotent or reject already-accounted attempts before incrementing counters. One reasonable approach:

Require turn_runs.active_attempt_id == attempt_id before counter mutation.
If the run is already terminal or active_attempt_id is null and last_attempt_id == attempt_id, return the current record or return a clear invalid transition without changing counters.
Do not increment counters unless this finish call is the first accounting of that attempt.

Required regression tests in both stores:

finish_turn_run_attempt_does_not_double_count_same_attempt

Test shape:

Start a run.
Start one attempt for it.
Commit/fail that attempt.
Call finish_turn_run_attempt once.
Record counters.
Call finish_turn_run_attempt again with the same attempt id.
Assert counters did not change.

5. Lifecycle hardening: `fail_turn_run` can clear `worlds.active_attempt_id` while leaving a running attempt row

fail_turn_run is called by the coordinator when start/finish progress fails:

let _ = world_store.fail_turn_run(turn_run_id, reason.clone()).await;

See src/turn_runs.rs:51-57 and src/turn_runs.rs:81-96.

But fail_turn_run clears worlds.active_attempt_id without necessarily terminalizing the active attempt.

Postgres:

UPDATE worlds SET active_turn_run_id = NULL, active_attempt_id = NULL
WHERE slug = $1 AND active_turn_run_id = $2

src/world_store/postgres.rs:2177-2184.

Memory:

w.active_turn_run_id = None;
w.active_attempt_id = None;

src/world_store/memory.rs:1751-1757.

If Runtime::run_claimed returns due to a store error before the attempt row becomes terminal, then finish_turn_run_attempt rejects the still-running attempt, the coordinator calls fail_turn_run, and the world can be left with:

attempts.status = running
worlds.active_attempt_id = NULL
worlds.active_turn_run_id = NULL
turn_runs.status = failed

That creates an orphan running attempt. Because there is a partial unique index on running attempts, this may also block future attempts until startup reconciliation. Startup recovery is a safety net, not the normal way to repair a live-process error.

Please make the coordinator/store failure path preserve lifecycle invariants. Options:

fail_turn_run should terminalize the active attempt as interrupted or failed in the same transaction before clearing worlds.active_attempt_id.
Or it should not clear active_attempt_id when an active attempt remains non-terminal.
Or the coordinator should explicitly fail/interruption-mark the active attempt before calling fail_turn_run.

Required regression test:

turn_run_coordinator_failure_does_not_orphan_running_attempt

The test can be store-level if a full fake coordinator test is too expensive.

6. Status observability gap: persisted source fields are dropped from `TurnRunRecord`

The migration stores:

turn_count_source
max_attempts_source

But TurnRunRecord does not expose them:

pub struct TurnRunRecord {
    pub turn_run_id: TurnRunId,
    pub world_slug: Slug,
    pub status: TurnRunStatus,
    pub requested_turn_count: u32,
    pub max_attempts: u32,
    ...
}

src/world_store/mod.rs:388-409.

turn_run_record_from_row selects turn_count_source and max_attempts_source, but because the DTO has no fields, the values are discarded.

The initial run_turn response includes the deterministic source/hint fields, which is good. But after the initial response, get_turn_run_status cannot tell the operator whether turn_count and max_attempts were explicit or defaulted.

Please add to TurnRunRecord:

pub turn_count_source: String,
pub max_attempts_source: String,

Then surface them in:

get_turn_run_status
cancel_turn_run
turn-run browse detail/list JSON

Optional but preferable: include the same deterministic hint strings in get_turn_run_status.

This is not as severe as the ownership bugs, but the DB already persists the information, and the ticket emphasized deterministic default/explicit visibility.

7. Add cross-world/mismatch tests; current tests mostly cover happy paths

The current test additions cover happy paths and basic cancellation, but they do not cover cross-world mismatch mutation/lookup behavior.

Please add at least:

cancel_turn_run_rejects_wrong_world_without_mutating
get_turn_run_status_rejects_wrong_world
get_turn_status_rejects_attempt_from_wrong_world
world_scoped_turn_run_detail_rejects_wrong_world
world_scoped_attempt_detail_rejects_wrong_world
finish_turn_run_attempt_does_not_double_count_same_attempt
turn_run_failure_path_does_not_orphan_running_attempt

These should run against MemoryWorldStore, and the store-level lifecycle tests should also be mirrored in Postgres where practical.

Acceptance bar for the revised resolution

Please move this ticket back to in_progress, fix the above in the same ticket, then re-propose resolution with:

Code changes called out by file.
New regression tests listed by exact test name.
Test run output for:
- cargo test --lib --features test-fixtures
- cargo test --tests --features test-fixtures
- cargo test --tests --features test-fixtures,postgres-tests -- --test-threads=1
One live smoke that proves:
- run_turn({world_slug}) still returns a single attempt_id.
- run_turn({world_slug, turn_count: 3}) completes with three attempts.
- cancel_turn_run with the wrong world_slug errors and does not mutate.
- get_turn_status with the wrong world_slug errors.
- get_turn_run_status(include_attempts=true) still returns newest-first recent attempts.

The implementation is close and the core architecture is good. The remaining work is about making ownership and lifecycle invariants as strict as the rest of the substrate. Once those are fixed in-ticket, this should be acceptable.

Add multi-turn `run_turn` support with durable turn-run lifecycle

Body

Ticket: Add multi-turn run_turn support with durable turn-run lifecycle

Goal

Non-goals

Required public tool behavior

run_turn

Single-attempt compatibility mode

Explicit single-turn mode

Multi-turn mode

Attempt semantics

New public tools

get_turn_run_status

cancel_turn_run

Existing public tool changes

get_turn_status

list_attempts

Database migration

Store-layer changes

start_turn_run

Existing start_attempt

start_attempt_for_turn_run

finish_turn_run_attempt

request_turn_run_cancel

reconcile_running_attempts

Runtime changes

MCP implementation changes

Tool registry

Tool schemas

handle_run_turn

handle_get_turn_run_status

handle_cancel_turn_run

store_attempt_to_json

Tests

Required MCP tests

Required store tests

Required manifest tests

Acceptance criteria

Proposed resolution

History (7 events)

1. Blocking: cancel_turn_run mutates before validating world_slug ownership

2. Blocking/related: get_turn_status also does not validate world_slug

3. Blocking for browseability: world-scoped turn-run detail route ignores the world slug

4. Lifecycle hardening: finish_turn_run_attempt is not idempotent and can double-count

5. Lifecycle hardening: fail_turn_run can clear worlds.active_attempt_id while leaving a running attempt row

6. Status observability gap: persisted source fields are dropped from TurnRunRecord

7. Add cross-world/mismatch tests; current tests mostly cover happy paths

Acceptance bar for the revised resolution

Ticket: Add multi-turn `run_turn` support with durable turn-run lifecycle

`run_turn`

`get_turn_run_status`

`cancel_turn_run`

`get_turn_status`

`list_attempts`

`start_turn_run`

Existing `start_attempt`

`start_attempt_for_turn_run`

`finish_turn_run_attempt`

`request_turn_run_cancel`

`reconcile_running_attempts`

`handle_run_turn`

`handle_get_turn_run_status`

`handle_cancel_turn_run`

`store_attempt_to_json`

1. Blocking: `cancel_turn_run` mutates before validating `world_slug` ownership

2. Blocking/related: `get_turn_status` also does not validate `world_slug`

4. Lifecycle hardening: `finish_turn_run_attempt` is not idempotent and can double-count

5. Lifecycle hardening: `fail_turn_run` can clear `worlds.active_attempt_id` while leaving a running attempt row

6. Status observability gap: persisted source fields are dropped from `TurnRunRecord`