Testing
Agent Express provides a complete testing toolkit via the agent-express/test entry point. Every test utility is designed to work without real API calls — zero cost, zero latency, fully deterministic.
import { TestModel, FunctionModel, testAgent, testSession, capture, RecordModel, ReplayModel, serializeForSnapshot, toMatchAgentSnapshot,} from "agent-express/test"TestModel
Section titled “TestModel”A deterministic mock model that implements LanguageModelV3. Use it as a drop-in replacement for real LLM providers in tests. Zero cost, zero latency, no network calls.
class TestModel implements LanguageModelV3 { constructor(opts?: TestModelOptions) reset(): void}| Option | Type | Default | Description |
|---|---|---|---|
responses | ModelResponse[] | undefined | Ordered list of responses. Each model call gets the next response. |
defaultText | string | "test response" | Default text when no responses configured or after responses exhausted (with auto-tool). |
Three Modes
Section titled “Three Modes”No config (auto-tool mode): On the first call, automatically calls all available tools with minimal valid arguments. On subsequent calls, returns "test response".
const agent = new Agent({ name: "test", model: new TestModel(), instructions: "test", defaults: false,})Pre-configured responses: Returns responses in order. Throws when exhausted.
const model = new TestModel({ responses: [ { toolCalls: [{ toolCallId: "tc-1", toolName: "search", args: { query: "cats" } }], usage: { inputTokens: 100, outputTokens: 50 }, finishReason: "tool-calls", }, { text: "Here are results about cats.", usage: { inputTokens: 200, outputTokens: 80 }, finishReason: "stop", }, ],})Default text: Always returns the specified text with no tool calls.
const model = new TestModel({ defaultText: "Hello from test!" })Reset for Reuse
Section titled “Reset for Reuse”Call model.reset() between tests to reset the call index:
const model = new TestModel({ defaultText: "Hi" })
afterEach(() => model.reset())FunctionModel
Section titled “FunctionModel”A callback-based mock model for complex test scenarios. Implements LanguageModelV3. Every model call is delegated to a user-supplied function.
class FunctionModel implements LanguageModelV3 { constructor(handler: FunctionModelHandler) reset(): void}The handler type:
type FunctionModelHandler = ( messages: Message[], info: { tools: FunctionModelToolDef[]; callIndex: number },) => ModelResponse | Promise<ModelResponse>Where FunctionModelToolDef is { name: string; description?: string; parameters: unknown }.
const model = new FunctionModel((messages, { tools, callIndex }) => { if (callIndex === 0) { return { toolCalls: [{ toolCallId: "tc-1", toolName: "search", args: { query: "cats" } }], usage: { inputTokens: 100, outputTokens: 50 }, finishReason: "tool-calls", } } return { text: "Done!", usage: { inputTokens: 200, outputTokens: 80 }, finishReason: "stop", }})The handler receives:
messages— conversation history asMessage[]info.tools— available tool definitions ({ name, description, parameters })info.callIndex— which call this is (0-based)
Use model.reset() to reset the call index between tests.
testAgent()
Section titled “testAgent()”A declarative test helper that runs an agent and checks assertions against the result. Supports single-turn and multi-turn testing.
async function testAgent(agent: Agent, opts: TestOptions): Promise<TestResult>Single-Turn Test
Section titled “Single-Turn Test”import { testAgent } from "agent-express/test"
// Agent must have observe.tools() for toolsCalled// and guard.budget() for costUnder assertionsconst result = await testAgent(agent, { input: "What is 2 + 2?", expect: { outputContains: "4", toolsCalled: ["calculator"], // requires observe.tools() costUnder: 0.01, // requires guard.budget() },})
expect(result.passed).toBe(true)Multi-Turn Test
Section titled “Multi-Turn Test”Pass an array of strings. Each string becomes one turn in a session:
const result = await testAgent(agent, { input: ["Hello, my name is Alice", "What is my name?"], expect: { outputContains: "Alice", },})Assertion Options
Section titled “Assertion Options”| Assertion | Type | Description |
|---|---|---|
toolsCalled | string[] | Tool names that should have been called (requires observe.tools()) |
outputContains | string | Substring that should appear in the text |
outputMatches | RegExp | Regex the text should match |
costUnder | number | Maximum acceptable cost in USD (requires guard.budget()) |
TestResult
Section titled “TestResult”interface TestResult { passed: boolean // Whether all assertions passed failures: string[] // Failure descriptions (empty if passed) run: RunResult // Full RunResult from the last turn}testSession()
Section titled “testSession()”A multi-turn session test helper that returns per-turn results and final session state. No built-in assertions — use with your test framework’s assertions.
async function testSession(agent: Agent, inputs: string[]): Promise<TestSessionResult>import { testSession } from "agent-express/test"
const result = await testSession(agent, ["Hello", "Follow up", "Goodbye"])
expect(result.turns).toHaveLength(3)expect(result.session.history).toHaveLength(6) // 3 user + 3 assistantexpect(result.session.state["observe:usage"]).toBeDefined()TestSessionResult
Section titled “TestSessionResult”interface TestSessionResult { turns: RunResult[] // Result from each turn session: { history: Message[]; state: Record<string, unknown>; id: string } passed: boolean failures: string[]}capture()
Section titled “capture()”Creates a middleware that records model call inputs and outputs for inspection. Useful when you need to examine exactly what was sent to and received from the model.
function capture(): { middleware: Middleware; result: CaptureResult }const { middleware, result } = capture()const agent = new Agent({ name: "test", model: new TestModel(), instructions: "test", defaults: false,}).use(middleware)
await agent.run("Hello").result
console.log(result.turns[0].input) // messages sent to modelconsole.log(result.turns[0].response) // model responseThe returned CaptureResult has:
| Property | Type | Description |
|---|---|---|
turns | TurnCapture[] | All captured model calls, in order |
clear() | () => void | Reset captured data to empty |
Each TurnCapture contains:
| Property | Type | Description |
|---|---|---|
callIndex | number | Which model call in this turn (0-based) |
input | Message[] | Messages sent to the model (snapshot taken before the call) |
response | ModelResponse | Model response returned after the call |
Record/Replay Cassettes
Section titled “Record/Replay Cassettes”Record real LLM interactions once, then replay them in tests forever. Zero cost after initial recording, and API keys are automatically scrubbed.
class RecordModel implements LanguageModelV3 { constructor(inner: LanguageModelV3) saveCassette(path: string): Promise<void>}
class ReplayModel implements LanguageModelV3 { static fromFile(path: string): Promise<ReplayModel> static fromJSON(data: any): ReplayModel}Recording
Section titled “Recording”Wrap a real model with RecordModel, run your test, then save the cassette:
import { RecordModel } from "agent-express/test"import { resolveModel } from "agent-express"
const real = await resolveModel("anthropic/claude-sonnet-4-6")const recorder = new RecordModel(real)
const agent = new Agent({ name: "test", model: recorder, instructions: "You are a helpful assistant.", defaults: false,})
const { text } = await agent.run("Hello").resultawait recorder.saveCassette("./fixtures/hello.cassette.json")The cassette JSON file contains all request/response pairs with API keys automatically redacted.
Replaying
Section titled “Replaying”Load a cassette and use ReplayModel as the model:
import { ReplayModel } from "agent-express/test"
const replay = await ReplayModel.fromFile("./fixtures/hello.cassette.json")
const agent = new Agent({ name: "test", model: replay, instructions: "You are a helpful assistant.", defaults: false,})
const { text } = await agent.run("Hello").result// Returns the exact same response that was recordedYou can also create a ReplayModel from parsed JSON data:
const replay = ReplayModel.fromJSON(parsedCassetteData)Cassette Format
Section titled “Cassette Format”interface Cassette { version: number // Format version (currently 1) model: string // Model identifier recordedAt: string // ISO timestamp interactions: CassetteInteraction[] // Ordered request/response pairs}Snapshot Testing
Section titled “Snapshot Testing”Compare agent output against stored snapshots using Vitest’s built-in snapshot infrastructure.
serializeForSnapshot()
Section titled “serializeForSnapshot()”Creates a deterministic serializable form of a RunResult. Sorts state keys alphabetically and excludes specified keys.
function serializeForSnapshot( result: Pick<RunResult, "text" | "state"> & { data?: unknown }, options?: SnapshotOptions,): Record<string, unknown>| Option | Type | Description |
|---|---|---|
exclude | string[]? | State keys to exclude from the snapshot (e.g., ["observe:duration"]) |
import { serializeForSnapshot } from "agent-express/test"
const result = await agent.run("Hello").resultconst serialized = serializeForSnapshot(result, { exclude: ["observe:duration"], // Exclude non-deterministic keys})
expect(serialized).toMatchSnapshot()The result is a plain object suitable for snapshot comparison.
toMatchAgentSnapshot()
Section titled “toMatchAgentSnapshot()”A custom Vitest matcher that compares a RunResult against a stored snapshot. Uses deterministic serialization and delegates to Vitest’s built-in snapshot infrastructure.
function toMatchAgentSnapshot( received: Pick<RunResult, "text" | "state"> & { data?: unknown }, options?: SnapshotOptions,): { pass: boolean; message: () => string }Register with expect.extend():
import { toMatchAgentSnapshot } from "agent-express/test"
expect.extend({ toMatchAgentSnapshot })
const result = await agent.run("Hello").resultexpect(result).toMatchAgentSnapshot({ exclude: ["observe:duration"],})Blocking Real API Calls
Section titled “Blocking Real API Calls”The agent-express test CLI command (see below) automatically sets ALLOW_REAL_REQUESTS=false before running tests. When combined with the Vitest setup file (vitest-agent-setup.ts), this blocks real API calls so tests never accidentally hit live endpoints.
To allow real requests in specific tests (e.g., integration tests):
import { setAllowRealRequests } from "agent-express/test"
beforeAll(() => setAllowRealRequests(true))afterAll(() => setAllowRealRequests(false))agent-express test CLI
Section titled “agent-express test CLI”The built-in test runner wraps Vitest with agent-specific configuration. See CLI for the full command reference.
# Run all agent tests (discovers *.agent.test.ts files)npx agent-express test
# JUnit XML output for CI pipelinesnpx agent-express test --ci
# Custom file patternnpx agent-express test --pattern "**/*.test.ts"The --ci flag outputs JUnit XML to ./test-results/junit.xml, suitable for CI systems like GitHub Actions, CircleCI, and Jenkins.
What It Does
Section titled “What It Does”- Sets
ALLOW_REAL_REQUESTS=falseto block real API calls - Discovers test files matching the pattern
- Runs tests via Vitest
- Outputs results (and JUnit XML with
--ci)
Complete Test Example
Section titled “Complete Test Example”import { describe, it, expect, afterEach } from "vitest"import { Agent, tools, guard, observe } from "agent-express"import { TestModel, testAgent } from "agent-express/test"import { z } from "zod"
const model = new TestModel({ responses: [ { toolCalls: [{ toolCallId: "tc-1", toolName: "add", args: { a: 2, b: 3 } }], usage: { inputTokens: 50, outputTokens: 20 }, finishReason: "tool-calls", }, { text: "The sum of 2 and 3 is 5.", usage: { inputTokens: 100, outputTokens: 30 }, finishReason: "stop", }, ],})
afterEach(() => model.reset())
const agent = new Agent({ name: "calculator", model, instructions: "You are a calculator.", defaults: false,}) .use(observe.tools()) .use(tools.function({ name: "add", description: "Add two numbers", schema: z.object({ a: z.number(), b: z.number() }), execute: async ({ a, b }) => a + b, }))
describe("calculator agent", () => { it("should call the add tool", async () => { const result = await testAgent(agent, { input: "What is 2 + 3?", expect: { toolsCalled: ["add"], outputContains: "5", }, }) expect(result.passed).toBe(true) })})