Seraph Development Status
Seraph is an AI guardian that remembers, watches, and acts. This page is the fastest answer to what is real on develop right now.
Legend
[x]shipped ondevelop[ ]not fully shipped ondevelop- in-flight branch work should be tracked in open PRs, not in this file
When this file is updated on an open feature branch, it reflects the intended post-merge develop state for that branch. Until merge, the open PR and its validation are the live integration truth.
Current Snapshot
- Seraph is usable today as a real guardian workspace with a browser cockpit, memory, screen awareness, proactive behavior, and a real action layer.
- The live planning surface is now
docs/research/plusdocs/implementation/. - Trust Boundaries, Execution Plane, and Runtime Reliability have strong foundations on
develop. - The target product shape is now a power-user guardian workspace, not a village-first shell.
- The guardian workspace is the only supported browser shell; the village/editor line is removed from the active repo path and should not be revived.
- The workspace now exposes capability discovery, starter packs, workflow history, step records, retry-from-step recovery, parameterized replay, reload continuity, a searchable capability palette, capability preflight/autorepair, a separate Activity Ledger window, a denser operator terminal, live operator feed, saved runbook macros, and explicit continue/open-thread controls instead of leaving those as implicit operator knowledge.
- The workspace window system now uses flatter terminal-style chrome with close controls, a Windows visibility menu, and per-pane hide/show state instead of only static rounded dashboard cards.
- No workstream is complete yet.
- Seraph is not yet the finished guardian product described in the research docs.
Docs Contract
-
docs/research/00-synthesis.mddefines what Seraph is trying to become. -
docs/research/10-competitive-benchmark.mdowns the comparative judgment. -
docs/research/11-superiority-program.mdowns the design-level superiority program. - this file owns the fastest shipped snapshot on
develop. -
docs/implementation/00-master-roadmap.mdowns the live 10-PR queue. -
docs/implementation/08-docs-contract.md,docs/implementation/09-benchmark-status.md, anddocs/implementation/10-superiority-delivery.mdare the implementation-side mirrors of the research evidence/benchmark/program docs. -
docs/implementation/01through07remain the workstream docs;08through10are meta mirrors, not extra workstreams.
Current Focus On develop
- The latest delivery batch is now complete for the current roadmap horizon: capability bootstrap v3, extension studio v1, workflow branching/resume v1, cockpit density v4, provider explainability/budgets v3, execution hardening v9, native-channel expansion v5, world-model fusion v9, guardian-learning policy v9, and guardian behavioral evals v9 all landed together.
- The roadmap has now refreshed to a new next-10 batch rather than leaving the just-shipped batch as future work.
- Guardian Intelligence remains central inside the current batch, but it is no longer the only active workstream.
- Runtime Reliability now has a strong baseline on
develop, but it is not fully complete. - The repo-wide 10-PR horizon is tracked in
docs/implementation/00-master-roadmap.md. - The next strategic focus is now the extension-platform transition beginning with
extension-model-terminology-v1,extension-manifest-schema-v1, andextension-registry-and-loader-v1, because Seraph now needs one coherent extension architecture for skills, workflows, starter packs, runbooks, and MCP connectors before deeper marketplace or managed-connector work can land cleanly. -
capability-pack-autoinstall-and-bootstrap-v3,extension-authoring-and-validation-studio-v1,workflow-step-branching-and-resume-v1,cockpit-density-and-live-operator-views-v4,provider-policy-explainability-and-budgets-v3,execution-safety-hardening-v9,native-channel-expansion-v5,world-model-memory-fusion-v9,guardian-learning-policy-v9, andguardian-behavioral-evals-v9are now represented in the shipped state this branch is preparing to merge. - The published 10-PR horizon should be refreshed whenever landed PR count from that queue is divisible by 5.
Current Target Shape
- dense guardian workspace as the primary operator surface
- first clear capability discovery, activation, preflight, and repair for tools, skills, workflows, MCP surfaces, starter packs, installable catalog items, and runbooks from inside that cockpit
- bounded capability bootstrap that can apply safe install or repair actions for workflows, runbooks, and starter packs directly from the operator surface
- cockpit-native extension authoring and validation for workflows, skills, and MCP configs with diagnostics, save, and repair handoff
- first browser reload and reconnect continuity for the active thread, with explicit fresh-thread semantics and background-activity badges
- explicit cross-surface thread model that links approvals, workflow runs, notifications, queued interventions, and recent interventions back to browser threads
- activity-ledger routing summaries, native thread metadata, and LLM spend attribution that make both live continuation state and day-scale budget use easier to inspect
- typed longitudinal memory and explicit guardian state
- policy-driven interventions with clear defer / bundle / act / request-approval decisions
- non-browser presence through a first coherent desktop surface, notifications, native reach, and action-card continuation payloads
- reusable workflow composition plus explicit feedback capture and future improvement loops
- workflow diagnostics with stored load errors, step timestamps and durations, error summaries, and recovery hints
- first branch/resume workflow control with checkpoint candidates, lineage metadata, resume drafts based on existing inputs, and safer approval-gated resume plans
Shipped On develop
Core guardian platform
- browser-based guardian workspace as the only supported browser shell
- FastAPI backend with chat, WebSocket, goals, tools, observer, settings, audit, approvals, vault, skills, and MCP APIs
- native macOS observer daemon for screen/window ingest
- persistent guardian record, vector memory, sessions, and goal storage
Trust and control
- tool policy modes for
safe,balanced, andfull - MCP policy modes for
disabled,approval, andfull - approval-gated high-risk actions in chat and WebSocket flows
- explicit execution-boundary metadata and approval behavior surfaced for tools and reusable workflows
- structured audit logging for approval, tool, and runtime events
- secret redaction and scoped secret-reference handling
- secret-reference resolution now stays limited to explicit injection-safe surfaces instead of resolving into arbitrary tool calls
Execution and integrations
- 17 built-in tool capabilities in the registry
- first capability-overview API that aggregates tools, skills, workflows, MCP servers, blocked-state reasons, and starter packs for one cockpit-readable surface
- capability-overview now also exposes installable catalog items, repair/install actions, recommendations, reusable runbooks, policy-aware starter-pack repair guidance, and machine-readable preflight/autorepair metadata for cockpit/operator use
- shell execution via sandboxed tool path
- browser automation foundation
- filesystem, guardian-record, goals, vault, and web-search tool foundations
- MCP server management and runtime-managed server configuration
- visible tool execution streaming in chat and agent flows
- first-class reusable workflows loaded from defaults and workspace files, exposed through a workflows API and
workflow_runnerspecialist - starter packs that bundle default skills and workflows into directly activatable operator-facing packages
- forced approval wrapping for high-risk and approval-mode MCP workflow paths
- first operator workflow-control layer with workflow list/toggle/reload plus draft-to-cockpit support
- workflow loader/runtime metadata now derive from actual step tools and reject underdeclared workflow definitions
- workflow audit now surfaces structured workflow-run details for cockpit/operator views, including artifact-path lineage and degraded-step visibility
- workflow history endpoint now exposes run arguments, risk level, execution boundaries, approval counts, secret-ref acceptance, and artifact lineage for replay and operator inspection
- workflow history endpoint now also exposes timeline events, replay guardrails, parameterized replay drafts, approval-recovery messaging, pending-approval details, and explicit thread metadata for replay/open-thread control
- operator timeline API now unifies workflow runs, approvals, notifications, queued insights, recent interventions, and surfaced failures into one threaded live operator feed
- activity ledger API now projects workflow runs, approvals, guardian events, audit activity, tool steps, and attributed LLM call spend into one separate accountability feed for the browser workspace
- catalog/install surfaces for skills and MCP servers
Runtime and observability
- shared provider-agnostic LLM runtime settings
- ordered fallback chains across completion and agent-model paths
- health-aware rerouting away from recently failed targets
- runtime-path-specific profile preference chains across completion and agent-model paths
- wildcard runtime-path routing rules, with exact-path overrides taking precedence
- runtime-path-specific primary model overrides
- runtime-path-specific fallback-chain overrides
- first-class local runtime routing for helper, all current scheduled completion jobs, core agent, delegation, and connected MCP-specialist paths
- strict runtime-path provider safeguards for required capability intents plus cost, latency, task-class, and budget guardrails, with explicit degrade-open audit semantics when no compliant target exists
- runtime audit visibility across chat, WebSocket, session-bound helper LLM traces, scheduler including daily-briefing, activity-digest, and evening-review degraded-input fallback paths, strategist, proactive delivery transport, MCP lifecycle and manual test API flows, skills toggle/reload flows, observer plus screen observation summary/cleanup boundaries, embedding, vector store, guardian-record file, vault repository, filesystem, browser, sandbox, and web search flows
- deterministic runtime eval harness for fallback, routing, core chat behavior, observer refresh and delivery behavior, session consolidation behavior, tool/MCP policy guardrails, proactive flow behavior, delegated workflow behavior, workflow composition behavior, storage, observer, and integration seam contracts, including vault repository, the MCP test API, skills API, screen repository boundaries, and daily-briefing, activity-digest, plus evening-review degraded-input audit behavior
Guardian intelligence and proactive behavior
- guardian-record-backed persistent identity
- vector memory retrieval and consolidation
- hierarchical goals and progress APIs
- explicit guardian-state synthesis for chat, WebSocket, and strategist paths
- guardian world model now includes active projects, active constraints, recurring patterns, active routines, collaborators, recurring obligations, project timelines, memory signals, continuity threads, and recent execution pressure from degraded workflow/tool outcomes, not only focus, commitments, open loops or pressure, alignment, and receptivity
- observer salience, confidence, and interruption-cost scoring for observer refresh, guardian state, and proactive policy
- explicit intervention-policy decisions for proactive delivery, including act / bundle / defer / request-approval / stay-silent classifications
- persisted guardian intervention outcome tracking plus explicit feedback capture, including notification acknowledgement and feedback API flows
- first multi-signal guardian learning loop that can reduce interruption eagerness after negative outcomes, prefer direct delivery, native reroute, and async-native escalation after repeated positive/acknowledged outcomes, and now also emit phrasing, cadence, timing, suppression, blocked-state, and thread guidance back into guardian state and intervention policy
- second-layer salience calibration that promotes aligned active-work signals and allows grounded high-salience nudges to cut through generic high-interruption bundling outside focus mode
- deterministic guardian behavioral proof that grounded high-salience observer state can still deliver through high interruption cost while degraded observer confidence defers before transport
- deterministic guardian behavioral proof that strategist tick can use learned direct/native-delivery bias and still surface the resulting intervention through continuity state
- strategist agent and strategist scheduler tick
- daily briefing, evening review, activity digest, and weekly review surfaces
- observer refresh across time, calendar, git, goals, and screen context
- proactive delivery gating and queued-bundle behavior
- first coherent desktop presence surface with daemon status, capture-mode visibility, pending native-notification state, a safe test-notification path, native-notification fallback delivery when browser sockets are unavailable but the daemon is connected, browser-side inspect/dismiss controls for queued desktop notifications, a unified continuity snapshot for daemon state, queued bundle items, and recent interventions, and an actionable cockpit desktop-shell card for follow-up, dismiss, and continue flows
Current interface surface
- workspace-first browser guardian shell with session rail, guardian-state panel, workflow-run views, interventions feed, audit surface, trace view, pending approvals, recent outputs, operations inspector, artifact round-trip into the command bar, a fixed composer, and live send fallback
- cockpit workflow and artifact inspectors can now draft compatible follow-on workflows directly from existing artifact paths instead of only inserting generic file-context commands
- grid-snapped draggable panes plus packed persisted
default/focus/reviewlayouts with keyboard switching, per-layout save, and per-layout reset now define the main cockpit workspace - the pane workspace now also supports per-pane close/hide controls, a dedicated Windows menu for visibility and focus, and flatter Godel-style window chrome
- the cockpit now includes a first desktop-shell rail for pending native notifications, queued bundle items, and recent interventions with direct follow-up, continue, and dismiss controls
- the cockpit now includes a first operator surface for tool/MCP policy state, workflow availability, tools, skills, starter packs, and MCP server visibility with direct reload and activation controls
- the cockpit now also includes a searchable capability palette plus a denser operator terminal for recommendations, repair actions, installable items, reusable runbooks, capability preflight, live operator-feed status, and saved runbook macros
- the workspace now includes a separate Activity Ledger window that links workflow runs, approvals, queued continuity, recent interventions, surfaced failures, tool steps, and attributed LLM calls back to one browser thread model
- activity ledger rows now group request-scoped work into compact parent rows with emoji/icon scanability, child tool or routing rows, and completion summaries so operators can skim what Seraph did without opening raw trace panes
- the cockpit now restores the last active session on reload, preserves explicit fresh-thread semantics, and marks background thread activity instead of silently resetting to an empty conversation
- larger more readable settings and priorities overlays now support the guardian workspace directly
- capability state, workflow history, the activity ledger, and live status are now visible in the current cockpit surface
- settings and management surfaces for tools, MCP, and system state
- macOS daemon-backed desktop presence card plus browser-side inspect/dismiss controls for native notifications and notification fallback for non-browser proactive reach
Ecosystem foundations
-
SKILL.mdsupport and runtime skill loading - MCP-powered extension surface
- recursive delegation foundations behind a flag
- reusable workflow runtime with tool, skill, specialist, and MCP-aware gating
Still To Do On develop
Runtime and execution
- richer provider selection policy beyond the shipped weighted scoring, required capability safeguards, tier guardrails, path patterns, explicit overrides, ordered fallbacks, and cooldown rerouting
- broader eval coverage beyond the shipped REST, WebSocket, observer refresh, delivery policy, salience/confidence delivery, strategist-learning continuity, consolidation, proactive, tool/MCP guardrail, delegated workflow, and workflow-composition behavioral contracts
- stronger execution isolation and privileged-path hardening beyond the first workflow/tool boundary pass
- richer capability installation, recommendation, and recovery beyond the new starter-pack repair guidance, catalog-install, runbook preflight, bounded bootstrap flow, and first cockpit-native extension studio
Guardian intelligence
- stronger learning and feedback loops beyond the first multi-signal delivery/channel/timing/suppression/thread layer
- deeper guardian world modeling, learning loops, and stronger intervention quality beyond the new project/routine/collaborator/obligation-aware world-model layer
- stronger salience calibration and confidence quality beyond the first aligned-work/high-salience pass
Interface and presence
- richer cockpit density and broader keyboard/operator control beyond the first dedicated workflow-run shell
- richer cross-surface continuity and broader non-browser presence beyond the new continuity snapshot, action-card continuation model, and first actionable desktop-shell/browser-native control layer
- stronger explicit threading between ambient observation, workflow runs, native notifications, approvals, and deliberate interaction beyond the new shared thread metadata and continue/open-thread layer
Workflow and leverage
- deeper operator-facing workflow control and workflow history beyond the new workflow-runs API, replay guardrails, timeline events, and cockpit workflow timeline
- stronger extension ergonomics around reusable capabilities and workflows beyond the new cockpit operator surface, starter packs, repair flows, and runbooks
Practical Summary
- Seraph already has a serious guardian core: memory, observer loop, strategy, tools, approvals, runtime audit, and deterministic evals.
- The strongest current moat is guardian-oriented state plus proactive scaffolding, not the UI.
- The biggest gaps against the reference systems are versioned capability distribution, deeper extension-studio ergonomics, visual workflow branch debugging, deeper execution hardening, stronger intervention learning beyond the new world-model plus timing/suppression/thread layer, and broader native reach.
- The next major step is to deepen the new cockpit shell into a denser, more legible, more stateful guardian workspace without losing the existing trust and memory foundations.
Workstream View
- Workstream 01: Trust Boundaries is only partially complete
- Workstream 02: Execution Plane is only partially complete
- Workstream 03: Runtime Reliability is only partially complete
- Workstream 04: Presence And Reach is only partially complete
- Workstream 05: Guardian Intelligence is only partially complete
- Workstream 06: Embodied Interface is only partially complete
- Workstream 07: Ecosystem And Delegation is only partially complete