Skip to main content

S1-B3: Runtime Reliability

Intent

Make Seraph more resilient, observable, and predictable under real usage.

Capabilities in scope

  • model/provider routing and fallback
  • at least one local-model-capable path
  • richer observability for agent behavior and tool execution
  • evaluation harness for core agent workflows
  • clearer degraded-mode behavior when dependencies fail

Current progress

This batch is active. Use the checklist below as the live status view.

Done

  • degraded-mode fallback in the token-aware context window when tiktoken cannot load offline
  • centralized provider-agnostic LLM runtime settings with an optional fallback completion path for direct LiteLLM calls
  • timeout-safe audit events for primary-vs-fallback direct LLM completion behavior so degraded mode is visible after the fact
  • fallback-capable smolagents model wrappers for the main agent, onboarding agent, strategist, and specialists so provider failure is less likely to collapse the interactive runtime
  • repeatable runtime eval harness for core guardian/tool reliability contracts so fallback wiring and degraded behavior can be checked without live providers
  • lifecycle audit events for REST/WebSocket chat runs and scheduled proactive jobs so runtime failures and skips are more visible after the fact

Working Now

  • broaden observability beyond the first direct LLM events and first chat/proactive-job lifecycle coverage

Still Open

  • broader model/provider routing beyond the first shared fallback path
  • deeper local-model-capable execution paths beyond a configurable API base/model swap
  • broader observability coverage across more tool/runtime paths
  • richer evaluation coverage beyond the first core guardian and tool scenarios

Non-goals

  • exhaustive benchmark program across every model
  • production-grade hosted observability platform
  • fully automated eval-driven deployment gating

Required architectural changes

  • centralize model selection and fallback strategy
  • standardize runtime event logging across critical paths
  • define test/eval scenarios for guardian, tool, and proactive flows
  • add explicit error-handling behavior for provider/tool outages

Likely files/systems touched

  • model configuration and agent factory paths
  • scheduler and proactive jobs
  • logging and evaluation utilities
  • tool failure and timeout handling

Acceptance criteria

  • provider failure does not collapse the entire chat path
  • a local or non-OpenRouter path is demonstrably possible
  • key flows are observable and easier to debug
  • the project has a repeatable evaluation harness for core behavior

Dependencies on earlier batches

Open risks

  • fallback logic can become inconsistent if added ad hoc
  • observability can create noise if events are not scoped well
  • local model support may underperform unless task routing is explicit