Workstream 02: Execution Plane

Status On This Branch

M2 execution completion is implemented in this batch and awaiting merge back to develop.

Paired Research

primary design doc: 05. Execution Plane

Shipped On `develop`

Working On Now

Still To Do On `develop`

richer live-provider browser/computer-use depth beyond the deterministic extract/html/screenshot proof and current local provider model
deeper host/container-grade isolation beyond disposable worker roots, session-scoped process handles, sandbox timeout/nonzero receipts, and Batch BW secure-host hardening receipts
broader live crash/restart replay across arbitrary tools, external trigger executors, exactly-once scheduling semantics, and long-running multi-agent field evidence beyond the deterministic v2 durable-orchestration receipts
broader external system leverage without weakening trust boundaries or the new credential-egress contract

M2 Batch Acceptance Notes

execution capability metadata now has one M2 readiness gate: m2_execution_supremacy.
file mutation now includes patch preview/apply receipts with unified diffs, before/after hashes, occurrence checks, rollback hints, and stable artifact records.
browser and HTTP-style execution keeps internal/private network protections on initial browser requests, browser subrequests, final/redirected targets, and DNS-resolved private addresses.
prior PR #445 shard failures are fixed in this batch: transient tiktoken loading no longer permanently disables retry, and memory-provider fixtures now declare network plus secret-management boundaries required by extension quarantine policy.
M2 completion is one PR: #427 and #435 close only when this integrated benchmark, artifact registry, operator proof, docs, and serial shard validation all land together.

Current Slice Record

`workflow-autonomy-supervision-and-artifact-control-v1`

status: complete on develop via PR #251
root cause addressed:
- workflow recovery metadata already advertised retry-from-step and branch control, but later-step failures without reusable checkpoint state could still make the runs API behave as if that branch path was available
- cockpit artifact chaining still assumed any workflow with a file_path input could consume any produced artifact, which made artifact-to-workflow handoff broad but not truthful
- workflow run lineage already carried parent/root branch metadata, but the cockpit still presented runs as isolated rows, which meant operators could not inspect parent/peer/child branches or continue the latest branch directly from the workflow surface
scope:
- workflow loader metadata now carries explicit artifact_input and artifact_types fields so workflow inputs can declare real artifact handoff expectations instead of relying on name-only conventions
- workflow runtime and audit payloads now persist reusable checkpoint context for safe branch-from-step reuse, preserve structured step/failure context on hard workflow failures, and carry control lineage such as parent/root run identity and branch depth
- the workflows API now exposes truthful checkpoint availability without letting unsupported later-step checkpoints 409 the entire runs list; unsupported checkpoints stay visible but non-resumable until a caller explicitly requests them
- cockpit workflow and artifact inspectors now bind artifacts only into compatible workflow inputs, surface checkpoint-driven branch/retry actions from concrete checkpoint candidates, expose attached approval decisions directly from workflow rows and inspectors, and derive workflow branch families from lineage so operators can open parent runs, inspect peer/child branches, and continue the latest branch without leaving the workflow surface
- regression coverage now pins checkpoint reuse, hard-failure audit payload shape, unsupported-checkpoint handling, typed artifact draft binding, and cockpit artifact/branch-family supervision controls
local regression fixed before the slice stayed complete:
- the first API pass still auto-selected the failed step as the default resume target even when that checkpoint had no reusable state, which made some degraded GET /api/workflows/runs responses fail with 409 Conflict
- fixed by skipping unsupported checkpoints for implicit/default resume selection while keeping explicit /resume-plan requests fail-closed for those same checkpoints
PR review follow-up fixed before merge:
- workflow parent-run checkpoint lookup still treated session_id + tool_name + fingerprint as unique, which meant two runs with identical arguments could restore checkpoint state from the wrong sibling run
- workflow audit-call fingerprinting still hashed raw control inputs while execution hashed normalized controls, which could split tool_call and tool_result events for semantically identical runs
- fixed by carrying a unique call_event_id discriminator from the audit call into workflow result/failure payloads, using that discriminator in workflow run identity and checkpoint restore lookup, and fingerprinting audit-call payloads from the same normalized inputs that execution uses
validation:
- python3 -m py_compile backend/src/tools/audit.py backend/src/workflows/loader.py backend/src/workflows/manager.py backend/src/workflows/run_identity.py backend/src/api/workflows.py backend/tests/test_workflows.py
- cd backend && .venv/bin/python -m pytest tests/test_workflows.py -q
  - result: 64 passed
- cd frontend && NODE_OPTIONS=--experimental-require-module npm test -- --run src/components/settings/workflowDraft.test.ts src/components/cockpit/CockpitView.test.tsx
  - result: 52 passed
- cd frontend && npm run build
  - result: passed
subagent review:
- two focused review passes were started for bugs, regressions, and hallucinated assumptions after the runtime plus cockpit slice landed
- both reviewer runs stalled before returning usable findings, and the follow-up branch-family supervision slice used the same direct diff verification plus targeted backend/frontend validation instead of claiming an unreturned clean review

Non-Goals

adding tools just to increase the count
unbounded process execution with weak policy control

Interface Checklist

native tools are auto-discoverable through the registry
MCP tools can be added and removed without code changes
tool execution is visible to the user

Acceptance Checklist

Seraph can browse, search, read/write local files, inspect goals, and use the shell
Seraph can use connected MCP servers in the current runtime
Seraph can execute richer cross-tool workflows than it could before the reusable workflow runtime
Seraph can expose workflow replay and safety context back to the operator instead of treating runs as opaque

Status On This Branch​

Paired Research​

Shipped On develop​

Working On Now​

Still To Do On develop​

M2 Batch Acceptance Notes​

Current Slice Record​

workflow-autonomy-supervision-and-artifact-control-v1​

Non-Goals​

Interface Checklist​

Acceptance Checklist​

Status On This Branch

Paired Research

Shipped On `develop`

Working On Now

Still To Do On `develop`

M2 Batch Acceptance Notes

Current Slice Record

`workflow-autonomy-supervision-and-artifact-control-v1`

Non-Goals

Interface Checklist

Acceptance Checklist