resolved d3e78844-2f43-4362-b135-be755771b2e7
Add cognition workflow reference-validation at the scenario store layer, whereas now it is only happening in the runtime.
Fix the cognition workflow reference-validation gap at the scenario store layer, while preserving the existing runtime/kernel validation as defense-in-depth.
Do not remove, weaken, bypass, or relax any runtime/kernel validation or reference-resolution checks. The runtime checks should remain in place as a final safety net. This change is about ensuring invalid schema/source hashes are caught earlier, when workflows are bound to cognition profiles or assembled into scenarios, rather than being discovered for the first time during execution.
A cognition profile must never become an executable scenario contract unless its workflow’s referenced hashes resolve.
Specifically:
put_cognition_profile with inline workflow content must reject workflows whose source_ref, tool source refs, tool schema refs, node final_schema_hash, or apply.final_schema_hash point at missing rows.put_cognition_profile with { workflow_hash: "..." } must reject an existing workflow if that workflow’s internal schema/source hashes do not resolve.assemble_scenario must validate profiles supplied by profile hash too, not only inline profile content.CognitionProfileRef::Hash.StoreError::NotFound, with a message naming the exact missing hash and kind, e.g. json_schema hash <hash> does not exist.Primary:
src/scenario_store/postgres.rs
src/scenario_store/memory.rs
Also update comments/docs where they currently imply reference resolution only happens in assembly:
src/workflow_validation.rs
Add tests in:
src/scenario_store/postgres.rs
src/scenario_store/memory.rs
src/mcp/tests.rs
Move workflow reference validation into profile resolution.
Right now resolve_profile_ref / resolve_profile allows this path:
CognitionProfileRef::Hash
→ select workflow_hash
→ return ResolvedProfile
That is the bug. The hash branch must validate the workflow it finds.
The inline branch also currently uses resolve_workflow_for_profile, which only structurally validates inline workflows. That is also the bug. Replace that structural-only behavior with full reference validation.
In src/scenario_store/postgres.rs, change resolve_and_validate_workflow_ref so it returns the resolved workflow hash and whether it inserted a new workflow:
async fn resolve_and_validate_workflow_ref(
tx: &mut Transaction<'_, Postgres>,
profile_context: &str,
r: &super::CognitionWorkflowRef,
) -> Result<(String, bool), StoreError>
Its behavior should be:
Hash form:
- SELECT content FROM cognition_workflows WHERE hash = $1
- If missing:
StoreError::NotFound("cognition_workflow hash <hash> does not exist (profile <profile_context>)")
- Structurally validate the fetched content.
- Resolve every referenced source/schema hash.
- Return (hash.clone(), false)
Inline form:
- Structurally validate the inline content.
- Compute canonical workflow hash.
- If it is the canonical placeholder workflow hash, skip external ref resolution.
- Otherwise resolve every referenced source/schema hash BEFORE inserting.
- If any referenced hash is missing, return StoreError::NotFound.
- Only after all refs resolve, insert into cognition_workflows.
- Return (hash, was_new)
The important ordering is: validate refs before inserting inline workflow content. This prevents put_cognition_profile from leaving behind invalid workflow rows when validation fails.
The missing-hash errors should be explicit:
StoreError::NotFound(format!(
"response_source hash {source_hash} does not exist \
(referenced by cognition_workflow {workflow_hash}, profile {profile_context})"
))
and:
StoreError::NotFound(format!(
"json_schema hash {schema_hash} does not exist \
(referenced by cognition_workflow {workflow_hash}, profile {profile_context})"
))
Keep the existing canonical placeholder workflow exception, because that placeholder intentionally uses fake source/schema hashes.
Then change resolve_workflow_for_profile to stop doing structural-only validation. Either remove it or make it a thin wrapper around the full helper:
async fn resolve_workflow_for_profile(
tx: &mut Transaction<'_, Postgres>,
agent_label: &str,
r: &super::CognitionWorkflowRef,
) -> Result<(String, bool), StoreError> {
let context = if agent_label.is_empty() {
"put_cognition_profile"
} else {
agent_label
};
resolve_and_validate_workflow_ref(tx, context, r).await
}
Now update resolve_profile_ref.
For CognitionProfileRef::Hash, after selecting workflow_hash, validate that workflow before returning:
let workflow_hash = select_cognition_profile_row(tx, hash)
.await?
.ok_or_else(|| {
StoreError::NotFound(format!(
"cognition_profile hash {hash} does not exist"
))
})?;
resolve_workflow_for_profile(
tx,
agent_label,
&ContentRef::Hash {
hash: workflow_hash.clone(),
},
)
.await?;
For CognitionProfileRef::Inline, keep resolving through resolve_workflow_for_profile, but now that function must do full ref validation, not just structural validation.
Then remove the current assembly-only pre-pass:
for (label, r) in &input.cognition_profiles {
if let CognitionProfileRef::Inline { content } = r {
resolve_and_validate_workflow_ref(&mut tx, label, &content.workflow_hash)
.await?;
}
}
That pre-pass is now obsolete and incomplete. Validation must happen inside profile resolution for both inline and hash forms. Removing it also avoids double-upserting inline workflows and fixes new_components.cognition_workflows accounting.
Mirror the same logic in src/scenario_store/memory.rs.
Change:
fn resolve_and_validate_workflow_ref_memory(...)
to return:
Result<(String, bool), StoreError>
For inline workflows, do not insert into inner.cognition_workflows until after external refs have been checked. This matters more in memory than Postgres because there is no transaction rollback.
Then update:
resolve_workflow_for_profile_memory
resolve_profile
assemble_scenario
fork_scenario
the same way as Postgres.
The profile hash branch in memory must validate:
cognition_profile hash
→ workflow_hash
→ cognition_workflows[workflow_hash]
→ structural validation
→ response_sources/json_schemas reference validation
No stored profile hash should bypass this.
Do not make put_cognition_workflow resolve external refs yet unless the product decision changes.
It is okay for put_cognition_workflow to remain structural-only, because an isolated workflow may be authored before its sources/schemas exist.
The validation boundary being fixed here is: when a workflow is bound to a cognition profile or used in scenario assembly/forking, it must be executable.
Do not remove or weaken runtime/kernel validation.
The runtime should continue to validate/resolve workflow schema and source references exactly as it does today. Store-level validation is an additional earlier guardrail, not a replacement. Runtime validation remains defense-in-depth for corrupted rows, manual database edits, future store bugs, migration mistakes, or any scenario artifact produced before this fix.
handle_put_cognition_profile already maps store NotFound to:
McpError::from_store_error(e, "UNKNOWN_HASH")
Keep that. The desired tool response for a missing schema/source during put_cognition_profile is:
{
"isError": true,
"code": "UNKNOWN_HASH",
"message": "json_schema hash <hash> does not exist ..."
}
For assemble_scenario, it is acceptable if the code remains BAD_SCENARIO, but the message must explicitly identify the missing hash and kind. Do not return a vague runtime failure.
Add tests for both Postgres and memory.
put_cognition_profile rejects inline workflow with missing schema hashSetup:
- Insert a valid response_source.
- Do not insert the schema.
- Build an inline workflow that references:
source_ref = existing source hash
final_schema_hash = fake 64-char hash
apply.final_schema_hash = same fake hash
- Call put_cognition_profile with that inline workflow.
Expected:
Err(StoreError::NotFound(msg))
msg contains "json_schema"
msg contains the fake schema hash
msg contains "does not exist"
Also assert the invalid inline workflow was not inserted into cognition_workflows.
put_cognition_profile rejects workflow_hash whose internal refs are missingSetup:
- Insert a structurally valid cognition_workflow with put_cognition_workflow.
- That workflow should reference a fake source or fake schema hash.
- Call put_cognition_profile with { workflow_hash: workflow_r.hash }.
Expected:
Err(StoreError::NotFound(msg))
msg names the missing response_source/json_schema hash
This confirms the hash form is not trusted just because the workflow row exists.
assemble_scenario rejects stored profile hash pointing at invalid workflowBecause put_cognition_profile will no longer allow creation of a bad profile, seed the bad profile directly in the test.
Postgres setup:
- Insert invalid-ref workflow via put_cognition_workflow.
- Directly INSERT INTO cognition_profiles(hash, workflow_hash)
using any valid 64-char profile hash and the workflow hash.
- Assemble a scenario whose cognition_profiles map uses:
CognitionProfileRef::Hash { hash: seeded_profile_hash }
Expected:
Err(StoreError::NotFound(msg))
msg contains "json_schema" or "response_source"
msg contains the missing hash
Memory setup:
- Insert invalid workflow via put_cognition_workflow.
- Directly insert a CognitionProfileRow into inner.cognition_profiles.
- Assemble with CognitionProfileRef::Hash.
Expected is the same.
This is the exact regression for the observed bug.
put_cognition_profile reports UNKNOWN_HASHIn src/mcp/tests.rs, add a test that calls the tool surface:
{
"name": "put_cognition_profile",
"arguments": {
"content": {
"workflow": {
"...": "workflow with existing source hash and fake schema hash"
}
}
}
}
Expected:
out["isError"] == true
body["code"] == "UNKNOWN_HASH"
body["message"] contains "json_schema hash"
body["message"] contains fake hash
body["message"] contains "does not exist"
This bug is fixed only when all of these are true:
put_cognition_profile inline invalid workflow → immediate UNKNOWN_HASH / NotFound
put_cognition_profile workflow_hash to invalid workflow row → immediate UNKNOWN_HASH / NotFound
assemble_scenario inline profile invalid workflow → immediate failure before manifest persistence
assemble_scenario profile hash pointing to invalid workflow → immediate failure before manifest persistence
fork_scenario inherited/upserted profile hash pointing to invalid workflow → immediate failure
runtime/kernel remains protected, but ordinary tool/store paths no longer allow runtime/kernel to be the first place this class of missing schema/source hash is discovered
The validation must be store-level and transaction-safe. Runtime should never be the first place this missing hash is discovered.
Implemented the scenario-store workflow reference-validation fix.
Commit: 5504cb4 fix(scenario-store): validate workflow refs Push: gitlab/main now points at 5504cb4. Deploy: k8s/deploy.sh completed successfully; deployment/chukwa rolled out and pod chukwa-c799b9b96-t5jnl is Ready 1/1.
What changed:
Live ZIP verification from https://chukwa.benac.dev/chukwa-repo.zip:
Tests run:
The working tree is clean except for the pre-existing untracked .codex directory.
I’ve picked this up. I’m going to move workflow reference validation into scenario-store profile resolution for both memory and Postgres, preserve runtime validation, add regression coverage for store and MCP behavior, then run the focused Rust tests before proposing resolution.
Implemented, committed, pushed, deployed, and externally verified the scenario-store workflow reference-validation fix.
Caller accepted: Accepted. Store-layer workflow reference validation is enforced through profile resolution for inline and hash-form profiles. Runtime validation remains as defense-in-depth.
Sign in as a human to drive this ticket from the page, or use the MCP tools.
Ticket created: Add Add cognition workflow reference-validation at the scenario store layer