Agent Foundry Labs
INSIGHTS
Notes on building production AI agents.
Answer-first writing on evaluation, observability, and what it takes to get an agent past the demo and into daily production use.
AI agent observability: what to monitor in a production agent
AI agent observability means tracing every step an agent takes in production — token cost, latency, tool-call fidelity, and failure modes — so you can see why it behaved as it did.
An AI agent evaluation framework: what to measure and how a harness works
A practical AI agent evaluation framework measures task success, tool-call fidelity, cost, and latency against real data, run by a repeatable harness — so an agent ships measured, not asserted.
AI agent evaluation: how to evaluate an AI agent
You evaluate an AI agent by measuring it against an outcome agreed up front — task success, tool-call fidelity, cost, and latency — on real data, repeatably. An agent should ship measured, not asserted.