Testing Guide
Seraph has 182 automated tests (127 backend, 55 frontend) with CI running on every push and PR.
Running Tests
Backend
cd backend
uv sync --group dev # Install dev dependencies (first time)
uv run pytest -v # Run all tests with coverage
uv run pytest --no-cov # Run without coverage (faster)
uv run pytest tests/test_session.py -v # Run a single file
uv run pytest -k "test_create" # Run tests matching a pattern
Coverage is configured by default in pyproject.toml:
[tool.pytest.ini_options]
asyncio_mode = "auto"
addopts = "--cov=src --cov-report=term-missing"
Frontend
cd frontend
npm install # Install dependencies (first time)
npm test # Run all tests (single run)
npm run test:watch # Watch mode (re-runs on changes)
npm run test:coverage # Run with coverage report
Frontend tests use Vitest with jsdom, configured in vite.config.ts.
Test Structure
Backend (backend/tests/)
| File | Tests | Coverage |
|---|---|---|
test_session.py | 18 | SessionManager — async DB-backed CRUD, history, pagination, title generation |
test_goals_repository.py | 18 | GoalRepository — CRUD, tree building, dashboard stats, cascading deletes |
test_goals_api.py | 10 | Goals HTTP endpoints — create, list, filter, tree, dashboard, update, delete |
test_sessions_api.py | 8 | Session HTTP endpoints — list, messages, update title, delete |
test_profile.py | 7 | User profile + onboarding — get/create, mark/reset complete, HTTP endpoints |
test_soul.py | 7 | Soul file persistence — read/write, section update, ensure exists |
test_shell_tool.py | 7 | Shell execution — success, errors, size limits, timeout, connection errors |
test_consolidator.py | 5 | Memory consolidation — extract facts, soul updates, markdown fences, LLM failure |
test_plugin_loader.py | 5 | Tool auto-discovery — scan, expected tools, no duplicates, caching, reload |
test_mcp_manager.py | 5 | MCP server integration — connect, disconnect, failure handling |
test_chat_api.py | 5 | REST chat endpoint — success, session continuity, errors |
test_agent.py | 4 | Agent factory — tool count, model creation, context injection |
test_tool_registry.py | 4 | Tool metadata registry — lookup, required fields, copy safety |
test_tools.py | 9 | Filesystem tools, template tool, web search |
test_websocket.py | 3 | WebSocket — ping/pong, invalid JSON, skip onboarding |
Frontend (frontend/src/)
| File | Tests | Coverage |
|---|---|---|
stores/chatStore.test.ts | 16 | Zustand chat store — sync actions (messages, panels, visual state) + async actions (profile, sessions, onboarding) |
lib/toolParser.test.ts | 15 | Tool detection — all 5 regex patterns, fallback substring match, Phase 1/2/Things3 tools |
lib/animationStateMachine.test.ts | 12 | Animation targets — tool→position mapping, facing direction, idle/thinking states |
stores/questStore.test.ts | 8 | Zustand quest store — goal CRUD, tree, dashboard, filters, refresh |
config/constants.test.ts | 4 | Constant integrity — tool count, position ranges, scene keys, waypoint count |
Writing New Tests
Backend: Using the async_db Fixture
All database-dependent tests use the shared async_db fixture from conftest.py. It creates an in-memory SQLite database and patches get_session across all modules.
from src.agent.session import SessionManager
async def test_example(async_db):
sm = SessionManager()
session = await sm.get_or_create("test-id")
assert session.title == "New Conversation"
For HTTP endpoint tests, use the client fixture (which depends on async_db):
async def test_list_goals(client):
res = await client.get("/api/goals")
assert res.status_code == 200
Frontend: Mocking Fetch
Store tests mock globalThis.fetch and reset store state between tests:
import { vi, beforeEach } from "vitest";
import { useChatStore } from "./chatStore";
const mockFetch = vi.fn();
globalThis.fetch = mockFetch;
beforeEach(() => {
useChatStore.setState({ messages: [], sessionId: null });
vi.clearAllMocks();
});
What Is NOT Tested
These areas are intentionally excluded from the test suite:
- Phaser game objects (StudyScene, AgentSprite, UserSprite, SpeechBubble) — require WebGL context, fragile mocking
- EventBus.ts — single-line Phaser EventEmitter wrapper
- Browser/Calendar/Email tools — thin wrappers around OAuth-dependent libraries
- LanceDB vector_store.py — requires real embeddings model loaded
- Full WS message streaming — complex sync/async interaction with agent streaming; basic WS tests cover ping, error handling, and skip_onboarding
CI/CD
Tests run automatically on every push and PR to main and develop via GitHub Actions (.github/workflows/test.yml).
Two parallel jobs:
- backend-tests: Ubuntu, Python 3.12,
uv sync --group dev,uv run pytest -v - frontend-tests: Ubuntu, Node 20,
npm ci,npm test
Redundant runs are cancelled automatically via concurrency groups.