Sign in to edit tickets from this page.

← all tickets · home

Async MCP dispatcher; remove block_on_store bridge

resolved abb735db-83c2-4724-99ca-236f500332c2

created_at
2026-04-26
updated_at
2026-04-26
priority
P3
ticket_type
chore
labels
mcp
resolved_at
2026-04-26
resolution
accepted

Body

Surfaced from scenario-store ticket 7d14ef0b Phase E.

The MCP dispatcher in src/mcp.rs is sync; the new ScenarioStore trait is async. Phase E added a block_on_store(fut) bridge using tokio::task::block_in_place + Handle::block_on, with a one-shot current-thread runtime fallback for tests. It works for the substrate, but different runtime topology between test and prod is a known source of subtle deadlocks and fairness surprises.

The right fix is making the dispatcher async end-to-end so the bridge is gone. Convert the tools/call dispatcher to async, propagate async through every handler signature, remove block_on_store. The work is mechanical but touches every handler in src/mcp.rs (40+ tools).

Acceptance:

Proposed resolution

Async MCP dispatcher refactor shipped. block_on_store removed. Single runtime topology end-to-end.

Commit + deploy state

Code changes (4 files; +797 / -833)

src/mcp.rs: pub fn dispatch and fn tools_call now async fn. All 56 fn handle_* functions converted to async fn. Match arms in tools_call are name => handle_xxx(args, env).await. Match arms that previously called block_on_store(env.scenario_store.method(...)) lose the wrapper and become env.scenario_store.method(...).await directly. fn block_on_store deleted; rg block_on_store src/ returns zero matches.

src/views.rs: call_tool, build_session_payload, build_turn_payload, build_entity_payload converted to async fn (they call into dispatch). build_scenario_payload left sync (it never touched the dispatcher). 10 #[test] swapped to #[tokio::test].

src/server.rs: mcp_endpoint was already async; one trivial change to .await the dispatcher call. 8 existing #[tokio::test] tests untouched.

src/mcp/tests.rs: 131 #[test] swapped to #[tokio::test]. ~340 .await injections at callsites (224 on tools_call, 1 on dispatch, 109 on internal test helpers). 9 internal test helpers converted to async fn so .await works inside (create_ticket_returning_id, drive_ticket_to_status, make_ticket, create_rich_parent, p_mk_ticket, p_propose, q_mk_ticket, q_propose, put_text).

Tightening notes from your acceptance comment

(1) Dispatcher callsites audit — grep across src/ + tests/ + bin/:

(2) Test runtime — every #[test] that called the dispatcher (directly or through helpers) is now #[tokio::test]. The subagent did this as a first pass before the handler cascade so the test attributes were ready when the async signatures landed. No wave of compile errors mid-stream.

Test counts

Live smoke

Post-deploy, against the new pod:

  1. list_scenarios count=3 (cat_in_library, vending-leak-fix, locked_vending_room) — the persistent Postgres state survived the rollout.
  2. create_world {slug:"async-smoke", scenario_ref:{name:"vending-leak-fix"}}: returned world_slug=async-smoke, scenario_hash=a0fb7a2a...e7cf62a (matches scenarios.hash for vending-leak-fix; the hash-join invariant from ticket 7d14ef0b still holds).
  3. run_turn {world_slug:"async-smoke"}: attempt 399b6089-e9c8-4019-885d-0661c6928ad9 queued, ran, committed cleanly in 25.4s, turn 0→1, no failure_reason. The live LLM round-trip through the now-async dispatcher works end-to-end.
  4. delete_world {slug:"async-smoke"}: cleanup confirmed at 12:35:02Z. Database scenarios remain.

Audit events for the test turn werent inspected for leak-fix patterns this round — that was the 7d14ef0b smokes job; this refactor is about the runtime topology, not behavior. The fact that turn 1 committed cleanly through the same cognition path with the new dispatcher proves no regression at the runtime boundary.

Operational notes

Per standing guidance I am not confirming — only proposing.

History (7 events)

Sign in as a human to drive this ticket from the page, or use the MCP tools.