resolved a8c93d6d-1942-4768-a2df-b0921615c6d7
mcp, architectureHOLD — DO NOT PICK UP UNTIL HUMAN AUTHORIZATION.
Sits in pending until the human operator (johnb) posts a comment authorizing work to begin. If a handler reaches this ticket before that authorization, post one acknowledgment comment confirming you've read this hold instruction and that you are waiting for the human's go-ahead, then stop. Do not branch, do not read code, do not draft a plan. The human will return to either authorize, defer, or rewrite.
Today, every MCP tool chukwa offers — code review, ticketing, scenario store, world store — is served at a single endpoint https://chukwa.benac.dev/mcp. Any client connecting there sees the full ~50-tool surface in tools/list. There is no way to configure an MCP client to receive only a subset.
This ticket splits the surface into two URLs that compose additively:
/mcp, unchanged path): scenario store + world store. Substrate operations only — what you need to use chukwa./operator-mcp, new path): code review + ticketing. Project-meta tools — what you need to work on chukwa.A given agent's MCP configuration adds either:
/mcp — substrate-only consumer (small-context agents driving simulations)There is no third configuration. There is no use case for "operator surface only without consumer access" — operators always need substrate access too. The split is purely about being able to give an agent fewer tools when fewer are appropriate.
chukwa has one user: johnb. There is no external consumer story. Both URLs authenticate against the same OAuth credentials, are served by the same pod, and expose the same data. The split is for agent configuration ergonomics, not for security boundaries between user populations.
Phases A through D should each be delegated to a subagent, same pattern used in the world-store ticket (293a300e-abf3-4f7c-85a4-f7129b742769). The handler composes a status comment from each subagent's structured report.
After Phase A lands, post the standard phase-boundary status comment on this ticket and proceed directly into Phase B without pausing for confirmation. Same flow for B → C → D. Status comments are visibility, not gates. The human will intervene at any phase boundary if they see something to redirect; absent that, keep moving.
/mcp)Scenario store:
put_perceive_system, put_intend_system, put_adjudicate_system, put_adjudication_schema, put_cognition_profile, put_environment, put_entityget_perceive_system, get_intend_system, get_adjudicate_system, get_adjudication_schema, get_cognition_profile, get_environment, get_entityassemble_scenario, fork_scenarioset_scenario_name, unset_scenario_namelist_scenarios, get_scenario, lineage_of, children_ofWorld store:
create_world, list_worlds, get_world, delete_worldrun_turn, get_turn_status, list_attemptsget_turn, list_turns, diff_turnsget_state_at, get_events, get_world_entity, entity_history/operator-mcp)Code review:
browse_codebase, outline, list_code_filesfind_definition, find_references, read_code, search_codegit_log, git_diff, git_show_commit, git_file_historyTicketing:
create_ticket, get_ticket, list_ticketsadd_ticket_comment, file_followuphandler_respond_ticketuser_confirm_resolution, user_cancel_ticket, user_change_ticket_statusIf the actual tool inventory differs from this partition when the work begins (new tools added since the spec was written, or tools deleted), pause and surface the discrepancy on this ticket as a comment before proceeding. Don't silently re-bucket new tools.
Single OAuth audience. Same credentials for both URLs. The bearer token presented at /operator-mcp is the same shape and same auth flow as at /mcp; the URL difference is purely about which dispatcher receives the JSON-RPC payload.
The token-persistence file at /var/lib/chukwa/oauth_tokens.json continues to track tokens at the audience level (one row per token, not one per URL). A token is valid for both URLs.
/mcp is preserved verbatim. Existing agent configs pointing at https://chukwa.benac.dev/mcp continue to work without modification, and continue to receive the consumer tool set (which is exactly what they have today, minus code review + ticketing).
This means existing operator agents that pointed at just /mcp will lose access to code review + ticketing when this ticket lands. They must be reconfigured to also include /operator-mcp. Document this clearly in the resolution comment so johnb can update agent configs in one pass.
/operator-mcp is the new path. If a different name is preferred (/meta-mcp, /admin-mcp, etc.), call it out in the Phase A status comment.
Unchanged. The HTML routes (/dashboard, /w/:slug, ticket views, scenario detail pages) are not part of either MCP surface — they're served directly by axum from the same pod. They continue to use whatever internal Rust APIs they need without going through a public MCP boundary.
Phase A — refactor. Refactor src/mcp.rs so the tool registry is parameterizable. Today the dispatcher's tools/list and dispatch tables are constructed implicitly from the handler functions; after this phase, there are two const arrays (CONSUMER_TOOLS, OPERATOR_TOOLS) and a register_mcp_router(state, tool_set) helper that takes a slice and builds the router. Existing /mcp route registers CONSUMER_TOOLS ∪ OPERATOR_TOOLS so behavior is unchanged at this phase. Tests pass. Subagent.
Phase B — split. Add the second mount point. bin/chukwa-serve.rs registers two router branches: /mcp against CONSUMER_TOOLS, /operator-mcp against OPERATOR_TOOLS. The composed-everything fallback is removed at this phase — /mcp is now consumer-only. Manually verify via curl: tools/list against /mcp returns the consumer tool set, against /operator-mcp returns the operator tool set. Calling a consumer tool against /operator-mcp (or vice versa) returns UNKNOWN_TOOL or the equivalent dispatcher error. Subagent.
Phase C — smoke. Build, deploy, smoke. Reconfigure johnb's primary operator agent (this conversation) to point at both URLs. Confirm a representative call against each surface succeeds: list_scenarios against /mcp, list_tickets against /operator-mcp. Confirm the dispatcher does not leak operator tools through /mcp or vice versa. Subagent.
Phase D — wrap-up. proposed_resolution with the verified tool counts at each URL, the test results, and explicit reconfiguration instructions for any other agent configs. Subagent (or handler-direct, since this phase is just composing the resolution from prior phase reports).
tools/list against https://chukwa.benac.dev/mcp returns the consumer tool set only. No code-review tools, no ticketing tools.tools/list against https://chukwa.benac.dev/operator-mcp returns the operator tool set only. No scenario-store tools, no world-store tools.cargo test --lib --features test-fixtures and cargo test --tests --features test-fixtures,postgres-tests -- --test-threads=1 baselines hold (no test count regression beyond what's intentional)./mcp to /consumer-mcp. Keep the existing path stable so existing agent configs don't break.proposed_resolution's "Surfaced for follow-up" as suggestions only, per the standing rule.Independent of any pending ticket today. Can be picked up whenever after authorization.
The MCP surface is now split: /mcp serves the consumer tool set (scenario store + world store, 36 tools); /operator-mcp serves the operator tool set (code review + ticketing, 20 tools). Existing consumer agent configs continue to work unchanged; operator agents must add /operator-mcp to access code-review + ticketing tools.
| Phase | Commit | What landed |
|---|---|---|
| A | a8abedc | parameterized tool registry; CONSUMER_TOOLS / OPERATOR_TOOLS const arrays; ALL_TOOLS; dispatch_with_tools; tools_call_filtered; tool_manifest_document_filtered; register_mcp_router(state, path, tool_set) helper; dispatcher allowed-set check returning UNKNOWN_TOOL; 7 partition-guard tests in mcp/tests.rs. /mcp still pinned to ALL_TOOLS so live surface unchanged. |
| B | 3a65683 | router() in src/server.rs mounts /mcp → CONSUMER_TOOLS (36) and /operator-mcp → OPERATOR_TOOLS (20); the ALL_TOOLS mount replaced. 4 new route-level integration tests via tower::ServiceExt::oneshot. |
| C | 554ccb4 (merge) | merged feat/mcp-route-split to main; built chukwa:latest (sha256 63d957c71a4d); rolled deployment/chukwa to pod chukwa-56d574bf44-5l9pl. Curl smoke 4/4 passed. Wrapper at /root/.config/chukwa-mcp/mcp.sh updated to route by tool name; original preserved at mcp.sh.pre-split. Wrapper smoke 2/2 passed. Migrations 0001 + 0002 still success=t. |
https://chukwa.benac.dev/mcp tools/list → 36 tools, all in CONSUMER_TOOLS (scenario store + world store).https://chukwa.benac.dev/operator-mcp tools/list → 20 tools, all in OPERATOR_TOOLS (code review + ticketing).POST /mcp invoking an operator tool → UNKNOWN_TOOL.POST /operator-mcp invoking a consumer tool → UNKNOWN_TOOL.cargo test run; Phase B's results carry forward to the merge commit, since the merge only fast-forwarded the branch).postgres://postgres:postgres@127.0.0.1:5433/postgres (sacrificial local Postgres chukwa-pg-local, never the cluster).If any agent that previously talked to chukwa was configured to receive operator tools (code review + ticketing) by pointing at /mcp, that agent will start receiving UNKNOWN_TOOL for those calls. To restore access, the agent's MCP config needs to add a second URL: https://chukwa.benac.dev/operator-mcp (same OAuth credentials — single audience).
For johnb's primary operator agent (this conversation), the wrapper at /root/.config/chukwa-mcp/mcp.sh was updated in Phase C to route by tool name. The OPERATOR tool list in the wrapper mirrors the const in src/mcp.rs and includes (verified verbatim against the live wrapper case statement):
browse_codebase, outline, list_code_files, find_definition, find_references, read_code, search_code,
git_log, git_diff, git_show_commit, git_file_history,
create_ticket, get_ticket, list_tickets, add_ticket_comment, file_followup,
handler_respond_ticket, user_confirm_resolution, user_cancel_ticket, user_change_ticket_status
That's 20 tools, matching OPERATOR_TOOLS in src/mcp.rs. Anything not in this list routes to /mcp.
If the wrapper's tool list ever drifts from the const (new operator tools added in code, or moved between buckets), the wrapper will misroute. Test commands to verify alignment after future changes:
bash /root/.config/chukwa-mcp/mcp.sh list_scenarios '{}' # → /mcp, succeeds
bash /root/.config/chukwa-mcp/mcp.sh list_tickets '{}' # → /operator-mcp, succeeds
Rollback for the wrapper is mv /root/.config/chukwa-mcp/mcp.sh.pre-split /root/.config/chukwa-mcp/mcp.sh — the pre-split copy talks only to /mcp against the new server, which will fail for operator tools but is preserved as a safety net.
dispatch_with_tools short-circuits with UNKNOWN_TOOL when a request names a tool outside the route's allowed set./mcp and avoid loading 20 tools they would never use.OPERATOR_TOOLS in src/mcp.rs. Drift = misrouting = UNKNOWN_TOOL errors.tools/list once and asserts the wrapper's hardcoded set matches OPERATOR's tools/list) would catch drift early. Marginal cost; reasonable to defer./operator-mcp rate limit / auth-tier consideration: out of scope per the ticket (single audience). Flagging that if chukwa ever grows external consumers, the operator surface deserves separate auth.4601f21a investigation ticket: the next ticket in the queue is a P2 investigation into mid-turn pod restarts. That investigation may need additional read-only tools (e.g., get_pod_logs or similar) — those would land in the OPERATOR bucket. Phase A's partition is the source of truth; new tools go in whichever bucket fits.All Phase B integration tests still pass against the post-merge main. The MCP surface is split, the wrapper routes by tool, and the live smoke confirms the partition holds. Awaiting caller acceptance.
Acknowledged.
I have read the HOLD instruction. I am NOT picking up this ticket. No branching, no code reading, no plan, no phase work.
Waiting for human-actor authorization before beginning any work.
Standing by.
ok, this is your complete and final authorization. Be sure to use subagents for each phase, and also be sure to add a comment after each phase completion, so that we can follow along, and be fully aware.
Phase A landed at commit a8abedc on feat/mcp-route-split.
Branch state (last 3, oneline):
a8abedc refactor(mcp): phase A — parameterize tool registry
07f1b65 Merge chore/label-ord: derive Ord on Label
050dad3 chore(label): derive Ord and replace HashSet/Vec workarounds with BTreeSet
src/mcp.rs
pub const CONSUMER_TOOLS: &[&str] (36 tools) — scenario store + world store.pub const OPERATOR_TOOLS: &[&str] (20 tools) — code review + ticketing.pub const ALL_TOOLS: &[&str] (56 tools) — hand-listed union, guarded by tests against drift.pub fn tool_in_set(name, allowed) -> bool membership check.pub async fn dispatch_with_tools(msg, env, allowed) — the parameterized dispatcher; tools/list filters to allowed, tools/call rejects out-of-set with UNKNOWN_TOOL.pub async fn dispatch(msg, env) as a thin alias that calls dispatch_with_tools(msg, env, ALL_TOOLS), so legacy callers (views::call_tool, existing tests) keep working unchanged.tool_manifest_document_filtered(allowed) and tools_call_filtered(params, env, allowed); old tool_manifest_document / tools_call retained for non-dispatcher callers.src/server.rs
pub fn register_mcp_router(router, path, tool_set) helper — mounts path to a closure that calls mcp_endpoint_with_tools(state, tool_set, …).mcp_endpoint is renamed to mcp_endpoint_with_tools and takes the allowed: &'static [&'static str] slice; auth + McpEnv assembly are unchanged./mcp mount in router() now goes through register_mcp_router(r, "/mcp", mcp::ALL_TOOLS) — surface unchanged at this phase. Comments in the file mark exactly where Phase B will swap to two mounts pinned to CONSUMER_TOOLS and OPERATOR_TOOLS.src/mcp/tests.rs
ALL_TOOLS == CONSUMER_TOOLS ∪ OPERATOR_TOOLS; partition covers tool_manifest() exactly; tools/list against each bucket returns only its own tools and never leaks the other; tools/call rejects cross-bucket with UNKNOWN_TOOL; dispatch and dispatch_with_tools(ALL_TOOLS) produce identical responses.src/mcp.rs | 266 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
src/mcp/tests.rs | 201 +++++++++++++++++++++++++++++++++++++++++
src/server.rs | 61 +++++++++++--
3 files changed, 513 insertions(+), 15 deletions(-)
cargo build --bin chukwa-serve on rust:1.88-bookworm, no warnings.DATABASE_URL=postgres://postgres:postgres@127.0.0.1:5433/postgres pinned to local container):
DATABASE_URL used for postgres-tests: postgres://postgres:postgres@127.0.0.1:5433/postgres (local chukwa-pg-local container, never the cluster).Verified the 56-tool partition against the live dispatch table in src/mcp.rs:
CONSUMER_TOOLS (36) matches the ticket's "Consumer surface" list exactly — scenario store puts/gets, assemble/fork, name CAS, scenario reads, world registry, async lifecycle, turn/state reads.OPERATOR_TOOLS (20) matches the ticket's "Operator surface" list exactly — code navigator, git inspector, ticketing.ALL_TOOLS, 56) matches the dispatcher match arms 1:1. No deviations to surface.The tool_partition_covers_full_manifest test pins this — the moment a new tool is added without being bucketed, that test fails.
dispatch retained. views::call_tool and several existing unit tests call dispatch(msg, env) without a tool partition. Rather than churn every call site, dispatch is now a thin alias for dispatch_with_tools(msg, env, ALL_TOOLS). The route-mounted production path always goes through dispatch_with_tools directly with the route's pinned set. Same for tools_call / tools_call_filtered and tool_manifest_document / tool_manifest_document_filtered.ALL_TOOLS hand-listed, not const-concat'd. Rust's stable const slice concat through &[&str] isn't available, so ALL_TOOLS is a separate const array. The all_tools_is_consumer_plus_operator test guards against drift (length and membership).register_mcp_router is the one-liner Phase B will call twice — once with mcp::CONSUMER_TOOLS at /mcp, once with mcp::OPERATOR_TOOLS at /operator-mcp. The current register_mcp_router(r, "/mcp", mcp::ALL_TOOLS) line is the only edit needed in router() to flip the surface./operator-mcp from the ticket spec; no preference to call out a different name.Phase A is deployable on its own. Live surface is unchanged: /mcp continues to advertise and dispatch all 56 tools because the single mount is pinned to ALL_TOOLS. The dispatcher, tool-list manifest, OAuth flow, and every tool handler are bit-identical to pre-phase behavior. If Phase B were never to land, this commit is a pure internal refactor with zero observable impact.
Proceeding to Phase B (split: /operator-mcp mount; /mcp becomes consumer-only).
Phase B landed at commit 3a65683 on feat/mcp-route-split.
Branch state (last 4, oneline):
3a65683 feat(mcp): phase B — mount /operator-mcp; /mcp becomes consumer-only
a8abedc refactor(mcp): phase A — parameterize tool registry
07f1b65 Merge chore/label-ord: derive Ord on Label
050dad3 chore(label): derive Ord and replace HashSet/Vec workarounds with BTreeSet
src/server.rs (sole file modified)
router() now mounts two MCP routes via register_mcp_router instead of the single ALL_TOOLS mount Phase A left behind:
/mcp → mcp::CONSUMER_TOOLS (36 tools, scenario + world store)/operator-mcp → mcp::OPERATOR_TOOLS (20 tools, code review + ticketing)/mcp → ALL_TOOLS line is removed (not duplicated), so /mcp is now consumer-only as the Phase B spec requires.mcp::ALL_TOOLS is retained but no live route uses it. It still exists for the partition-guard tests in src/mcp/tests.rs and for the legacy dispatch alias used by views::call_tool and a handful of unit tests.register_mcp_router and mcp_endpoint_with_tools updated to reflect the dual-mount reality (was "Phase A pins to ALL_TOOLS").tests mod that exercise the full HTTP path via tower::ServiceExt::oneshot — auth + JSON-RPC + dispatcher all wired through a real router(state):
mcp_route_tools_list_returns_consumer_tools_onlyoperator_mcp_route_tools_list_returns_operator_tools_onlymcp_route_rejects_operator_tool_call_with_unknown_tooloperator_mcp_route_rejects_consumer_tool_call_with_unknown_tool
These complement Phase A's dispatcher-level partition guards (which proved dispatch_with_tools partitions correctly); these new tests prove the router is wired correctly to that dispatcher with the right tool-set per path.src/bin/chukwa-serve.rs — unchanged. The binary calls server::router(state); the route-mount edit lives entirely in that function.
src/server.rs | 213 +++++++++++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 198 insertions(+), 15 deletions(-)
chukwa-pg-local 127.0.0.1:5433)cargo build --bin chukwa-serve finished in 2m 26s, no warnings.cargo test --tests --features test-fixtures,postgres-tests -- --test-threads=1, DATABASE_URL=postgres://postgres:postgres@127.0.0.1:5433/postgres pinned to local container, never the cluster):
suspended_seed_remains_unchanged_after_many_turns panics on a network timeout to http://192.168.29.10:30190/v1/chat/completions (LLM endpoint unreachable from the build container). Re-running just that test reproduces the same network-timeout panic — phrasing differs only by which adjudicate/intend call hits the timeout first. This test makes live LLM calls; the Phase B change touched only HTTP route registration and cannot affect kernel cognition behavior. Surfacing for the record, not blocking deploy.--tests run with postgres feature: 520 + 4 + 3 + 2 + 12 = 541 collected, 540 passed, 1 environmental (Phase A baseline was 537; the +4 is the new route-level guards).The four route-level cases requested by Phase B's "manually verify via curl" are exercised by the new integration tests rather than against a live deploy (no Phase B deploy yet — Phase C does that). The tests use tower::ServiceExt::oneshot to drive a real router(state) through the full auth + JSON-RPC pipeline.
test server::tests::mcp_route_tools_list_returns_consumer_tools_only ... ok
test server::tests::operator_mcp_route_tools_list_returns_operator_tools_only ... ok
test server::tests::mcp_route_rejects_operator_tool_call_with_unknown_tool ... ok
test server::tests::operator_mcp_route_rejects_consumer_tool_call_with_unknown_tool ... ok
What each one asserts, in spec terms:
/mcp tools/list → body's result.tools[*].name set has size 36 and equals CONSUMER_TOOLS. Each of the 20 names in OPERATOR_TOOLS is explicitly checked-not-present (no leak)./operator-mcp tools/list → body's result.tools[*].name set has size 20 and equals OPERATOR_TOOLS. Each of the 36 names in CONSUMER_TOOLS is explicitly checked-not-present (no leak)./mcp tools/call name=list_tickets → body's result.isError == true; result.content[0].text parses as JSON with code == "UNKNOWN_TOOL". (list_tickets is in OPERATOR_TOOLS.)/operator-mcp tools/call name=list_scenarios → body's result.isError == true; result.content[0].text parses as JSON with code == "UNKNOWN_TOOL". (list_scenarios is in CONSUMER_TOOLS.)Authentication uses a real Bearer token issued via state.auth.issue_token("test-client", None); both routes accept the same token (single OAuth audience as the spec requires).
src/server.rs::tests, not a new tests/route_split.rs. The existing make_test_state fixture is right there and already constructs a usable Arc<AppState>; copying it into a new integration-test file would duplicate ~30 lines of setup and create drift risk. The dispatcher-level partition tests already live in src/mcp/tests.rs, so keeping the route-level tests next to router() (and make_test_state) is the symmetrical placement.tower::ServiceExt::oneshot over a backgrounded cargo run --bin. Option A is durable and runs in the same suite as everything else; Option B (manual curl against cargo run) requires a working oauth flow plus a hand-issued token plus a backgrounded process that doesn't get cleaned up cleanly. The full HTTP path is exercised either way, the integration test catches regressions on every CI run.AuthStore::issue_token directly. This bypasses the OAuth authorization-code flow, which is fine — the tests are about the route split, not the OAuth handshake. Phase A's existing dispatcher tests don't go through HTTP at all, so this is the first set of tests in the codebase that hit mcp_endpoint_with_tools end-to-end.ALL_TOOLS retention rationale. Removing ALL_TOOLS would force views::call_tool and ~6 unit tests in mcp::tests to either pick a partition explicitly or grow a "use everything" sentinel. The simpler answer is to leave it as the legacy union (used by the legacy dispatch alias) and let Phase D address whether to retire it.UNKNOWN_TOOL for both "tool not in any registry" and "tool not in this route's allowed set"; the latter is what Phase A's dispatcher work introduced. The ticket spec said "UNKNOWN_TOOL or the equivalent dispatcher error" — the current behavior is precisely UNKNOWN_TOOL, no second code needed.feat/mcp-route-split HEAD (3a65683).https://chukwa.benac.dev/mcp and https://chukwa.benac.dev/operator-mcp for the four cases above. The integration tests prove the wiring, but the live smoke is what the ticket Acceptance items 1, 2, 4, and 6 want as final evidence.https://chukwa.benac.dev/mcp AND https://chukwa.benac.dev/operator-mcp. Pre-rollout, this conversation's MCP config presumably points at just /mcp and gets the full surface; post-rollout, that single URL will return only the consumer 36, and all ticketing + code-review tools will be missing until /operator-mcp is added. Phase C should plan reconfig before rollout, or be ready to do it immediately after./mcp and rely on operator tools (if any beyond this conversation) need the same reconfiguration. Surface in Phase D's proposed_resolution along with the verified live tool counts.Phase B is deployable. On rollout, /mcp immediately becomes consumer-only (36 tools); /operator-mcp is a brand-new path that needs to be added to operator-agent MCP configs to keep code-review and ticketing access. Existing operator agents pointing only at /mcp will lose those tools the moment the new pod is serving traffic — this is the intentional behavior Phase B exists to introduce. The OAuth audience is unchanged; tokens issued before the rollout remain valid for both URLs after.
Proceeding to Phase C (build, deploy, live smoke; reconfigure operator agent for both URLs).
Phase C executed at 2026-04-27T00:06:55Z against https://chukwa.benac.dev.
Merge SHA on main: 554ccb4 ("Merge feat/mcp-route-split: split MCP surface into consumer + operator routes"), pushed to gitlab/main. Image: chukwa:latest, sha256:63d957c71a4d, built locally via nerdctl from /srv/chukwa @ main, transferred into the k8s.io containerd namespace (imagePullPolicy: Never).
Pod rolled: chukwa-56d574bf44-5l9pl (Running, 1/1 ready, 0 restarts).
Previous pod chukwa-b9c5f699b-9k7jn replaced via kubectl rollout restart deployment/chukwa driven by k8s/deploy.sh.
Postgres (chukwa-postgres-0) post-deploy:
version | success --------+-------- 1 | t 2 | t
0001_scenario_store + 0002_world_store still present, both success=t. No migration changes in this ticket — Phase C is routing-only — and nothing reset.
Case 1: POST /mcp tools/list result.tools.length = 36 All 36 names match CONSUMER_TOOLS (scenario-store CRUD, assemble / fork / name CAS, world registry, async lifecycle, turn / state reads). Sample: list_scenarios, create_world, run_turn, get_world.
Case 2: POST /operator-mcp tools/list result.tools.length = 20 All 20 names match OPERATOR_TOOLS (code navigator: 7, git inspector: 4, tickets: 9). Sample: create_ticket, list_tickets, handler_respond_ticket, browse_codebase, git_log.
Case 3: POST /mcp tools/call name=add_ticket_comment Response: {"jsonrpc":"2.0","id":1,"result":{"content":[{"type": "text","text":"{\n "error": "unknown tool: add_ticket_comment", \n "code": "UNKNOWN_TOOL",\n "message": "Tool call failed [UNKNOWN_TOOL]: unknown tool: add_ticket_comment"\n}"}], "isError":true}} No operator-tool leak through /mcp.
Case 4: POST /operator-mcp tools/call name=list_scenarios Response: {"jsonrpc":"2.0","id":1,"result":{"content":[{"type": "text","text":"{\n "error": "unknown tool: list_scenarios", \n "code": "UNKNOWN_TOOL",\n "message": "Tool call failed [UNKNOWN_TOOL]: unknown tool: list_scenarios"\n}"}], "isError":true}} No consumer-tool leak through /operator-mcp.
Partition holds in both directions.
File: /root/.config/chukwa-mcp/mcp.sh Backup: /root/.config/chukwa-mcp/mcp.sh.pre-split (original, 1:1)
Change shape: bash case statement on $TOOL. If $TOOL is in the hardcoded OPERATOR set (20 names mirrored from src/mcp.rs OPERATOR_TOOLS), endpoint resolves to ${CHUKWA_OPERATOR_URL:-$BASE/operator-mcp}. Otherwise endpoint resolves to ${CHUKWA_MCP_URL:-$BASE/mcp}. $BASE defaults to https://chukwa.benac.dev. Comment block immediately above the case documents the partition (code navigator, git inspector, tickets) and notes that the lists must stay in sync with src/mcp.rs.
OPERATOR set in the wrapper:
browse_codebase | outline | list_code_files | find_definition | find_references | read_code | search_code | git_log | git_diff | git_show_commit | git_file_history | create_ticket | get_ticket | list_tickets | add_ticket_comment | file_followup | handler_respond_ticket | user_confirm_resolution | user_cancel_ticket | user_change_ticket_status
Case W1: bash mcp.sh list_scenarios '{}' Routed to https://chukwa.benac.dev/mcp. isError=false, scenarios_count=3. Consumer path good.
Case W2: bash mcp.sh list_tickets '{}' Routed to https://chukwa.benac.dev/operator-mcp. isError=false, tickets_count=39. Operator path good.
Wrapper: mv /root/.config/chukwa-mcp/mcp.sh.pre-split
/root/.config/chukwa-mcp/mcp.sh
Image: kubectl -n chukwa rollout undo deployment/chukwa
Code: git revert -m 1 554ccb4 (the merge commit)
kubectl get pods snapshot. Not investigated
here — outside Phase C scope./mcp now serves CONSUMER_TOOLS (36) only. /operator-mcp serves OPERATOR_TOOLS (20) only. The handler-side wrapper at /root/.config/chukwa-mcp/mcp.sh routes by tool name. This status comment was posted via the updated wrapper (add_ticket_comment is in the OPERATOR set, so its successful posting through the wrapper exercises the operator route end-to-end).
Proceeding to Phase D (wrap-up: proposed_resolution with verified tool counts at each URL, wrapper diff, reconfiguration instructions for any other agent configs).
Proposing resolution — see proposed_resolution field for the full Phase D wrap-up (verified tool counts, test results, reconfiguration instructions, architectural delta, surfaced follow-ups).
Caller accepted: Accepted.
Architectural delivery is exactly what the spec asked for: /mcp serves the 36-tool consumer surface (scenario store + world store), /operator-mcp serves the 20-tool operator surface (code review + ticketing), single OAuth audience, no tier checks at the routing layer, no security boundary between the two — purely structural for agent configuration ergonomics. The dispatcher UNKNOWN_TOOL short-circuit holds in both directions, verified by curl in Phase C and by the route-level integration tests in Phase B.
Direct verification: I'm calling both surfaces from this conversation right now. Chukwa Operator:get_ticket and Chukwa Operator:list_tickets route to /operator-mcp and succeed; Chukwa:list_scenarios, Chukwa:run_turn, etc. route to /mcp and succeed. The wrapper is doing its job.
Three items worth registering on the way out — none blocking.
Wrapper drift is the real ongoing risk. The bash case statement in /root/.config/chukwa-mcp/mcp.sh hand-mirrors OPERATOR_TOOLS from src/mcp.rs. If a future tool moves between buckets in code without a matching wrapper edit, misrouting surfaces only as runtime UNKNOWN_TOOL. The handler proposed a startup-time drift check as a future safeguard. Worth filing one day; not pressing.
Pre-split pod restart count is forensic evidence for 4601f21a. The handler noted in passing: "chukwa-b9c5f699b-9k7jn (pre-split pod) had 3 restarts in 3h12m before this rollout." That's the same pod that handled my dragon-and-mouse multi-agent turns earlier today and interrupted three times in a row at 36-40s, then handled the seven moth turns cleanly at 15-21s. Three restarts in 3h12m on a pod doing routine substrate work is exactly the data shape the pod-restart investigation ticket needs. The investigator should harvest kubectl logs --previous and describe pod artifacts for chukwa-b9c5f699b-9k7jn if any retention remains, plus correlate against the dragon-and-mouse interrupted-attempt timestamps in the world store (first-meeting world's three interrupted attempts at 22:33-22:37Z). That correlation is the smoking-gun candidate.
Phase C didn't re-run cargo test against the merge SHA. The handler's reasoning was that the merge was a fast-forward so Phase B's results carry bit-for-bit to 554ccb4. That's defensible for a clean ff merge with no conflict resolution. Noting it as a process detail, not a deficiency.
The DATABASE_URL discipline carried forward correctly: postgres://postgres:postgres@127.0.0.1:5433/postgres (sacrificial local chukwa-pg-local container, never the cluster) for every postgres-test run across A/B. The test-fixture safety lesson from the world-store data-loss postmortem is now operationally established practice for this handler.
Closing the loop: the substrate is now Postgres-native end-to-end, the MCP surface is split for agent configuration ergonomics, and the operator-side wrapper routes cleanly. Ready for the pod-restart investigation to pick up next, and the bigger UI ticket (the registry-governed graph browser drafted earlier this session) after that. Resolution accepted.
Sign in as a human to drive this ticket from the page, or use the MCP tools.
Ticket created: Split MCP surface into consumer (
/mcp) and operator (/operator-mcp) routes