Testing Guide

Seraph has 182 automated tests (127 backend, 55 frontend) with CI running on every push and PR.

Running Tests

Backend

cd backend
uv sync --group dev            # Install dev dependencies (first time)
uv run pytest -v               # Run all tests with coverage
uv run pytest --no-cov         # Run without coverage (faster)
uv run pytest tests/test_session.py -v  # Run a single file
uv run pytest -k "test_create" # Run tests matching a pattern

Coverage is configured by default in pyproject.toml:

[tool.pytest.ini_options]
asyncio_mode = "auto"
addopts = "--cov=src --cov-report=term-missing"

Frontend

cd frontend
npm install                    # Install dependencies (first time)
npm test                       # Run all tests (single run)
npm run test:watch             # Watch mode (re-runs on changes)
npm run test:coverage          # Run with coverage report

Frontend tests use Vitest with jsdom, configured in vite.config.ts.

Test Structure

Backend (`backend/tests/`)

File	Tests	Coverage
`test_session.py`	18	SessionManager — async DB-backed CRUD, history, pagination, title generation
`test_goals_repository.py`	18	GoalRepository — CRUD, tree building, dashboard stats, cascading deletes
`test_goals_api.py`	10	Goals HTTP endpoints — create, list, filter, tree, dashboard, update, delete
`test_sessions_api.py`	8	Session HTTP endpoints — list, messages, update title, delete
`test_profile.py`	7	User profile + onboarding — get/create, mark/reset complete, HTTP endpoints
`test_soul.py`	7	Soul file persistence — read/write, section update, ensure exists
`test_shell_tool.py`	7	Shell execution — success, errors, size limits, timeout, connection errors
`test_consolidator.py`	5	Memory consolidation — extract facts, soul updates, markdown fences, LLM failure
`test_plugin_loader.py`	5	Tool auto-discovery — scan, expected tools, no duplicates, caching, reload
`test_mcp_manager.py`	5	MCP server integration — connect, disconnect, failure handling
`test_chat_api.py`	5	REST chat endpoint — success, session continuity, errors
`test_agent.py`	4	Agent factory — tool count, model creation, context injection
`test_tool_registry.py`	4	Tool metadata registry — lookup, required fields, copy safety
`test_tools.py`	9	Filesystem tools, template tool, web search
`test_websocket.py`	3	WebSocket — ping/pong, invalid JSON, skip onboarding

Frontend (`frontend/src/`)

File	Tests	Coverage
`stores/chatStore.test.ts`	16	Zustand chat store — sync actions (messages, panels, visual state) + async actions (profile, sessions, onboarding)
`lib/toolParser.test.ts`	15	Tool detection — all 5 regex patterns, fallback substring match, Phase 1/2/Things3 tools
`lib/animationStateMachine.test.ts`	12	Animation targets — tool→position mapping, facing direction, idle/thinking states
`stores/questStore.test.ts`	8	Zustand quest store — goal CRUD, tree, dashboard, filters, refresh
`config/constants.test.ts`	4	Constant integrity — tool count, position ranges, scene keys, waypoint count

Writing New Tests

Backend: Using the `async_db` Fixture

All database-dependent tests use the shared async_db fixture from conftest.py. It creates an in-memory SQLite database and patches get_session across all modules.

from src.agent.session import SessionManager

async def test_example(async_db):
    sm = SessionManager()
    session = await sm.get_or_create("test-id")
    assert session.title == "New Conversation"

For HTTP endpoint tests, use the client fixture (which depends on async_db):

async def test_list_goals(client):
    res = await client.get("/api/goals")
    assert res.status_code == 200

Frontend: Mocking Fetch

Store tests mock globalThis.fetch and reset store state between tests:

import { vi, beforeEach } from "vitest";
import { useChatStore } from "./chatStore";

const mockFetch = vi.fn();
globalThis.fetch = mockFetch;

beforeEach(() => {
  useChatStore.setState({ messages: [], sessionId: null });
  vi.clearAllMocks();
});

What Is NOT Tested

These areas are intentionally excluded from the test suite:

Phaser game objects (StudyScene, AgentSprite, UserSprite, SpeechBubble) — require WebGL context, fragile mocking
EventBus.ts — single-line Phaser EventEmitter wrapper
Browser/Calendar/Email tools — thin wrappers around OAuth-dependent libraries
LanceDB vector_store.py — requires real embeddings model loaded
Full WS message streaming — complex sync/async interaction with agent streaming; basic WS tests cover ping, error handling, and skip_onboarding

CI/CD

Tests run automatically on every push and PR to main and develop via GitHub Actions (.github/workflows/test.yml).

Two parallel jobs:

backend-tests: Ubuntu, Python 3.12, uv sync --group dev, uv run pytest -v
frontend-tests: Ubuntu, Node 20, npm ci, npm test

Redundant runs are cancelled automatically via concurrency groups.

Running Tests​

Backend​

Frontend​

Test Structure​

Backend (backend/tests/)​

Frontend (frontend/src/)​

Writing New Tests​

Backend: Using the async_db Fixture​

Frontend: Mocking Fetch​

What Is NOT Tested​

CI/CD​