agent foundry labs

Agent Foundry Labs

INSIGHTS

Notes on building production AI agents.

Answer-first writing on evaluation, observability, and what it takes to get an agent past the demo and into daily production use.

AI agent observability: what to monitor in a production agent
AI agent observability means tracing every step an agent takes in production — token cost, latency, tool-call fidelity, and failure modes — so you can see why it behaved as it did.
Haroon Latif · 4 June 2026
An AI agent evaluation framework: what to measure and how a harness works
A practical AI agent evaluation framework measures task success, tool-call fidelity, cost, and latency against real data, run by a repeatable harness — so an agent ships measured, not asserted.
Haroon Latif · 23 April 2026
AI agent evaluation: how to evaluate an AI agent
You evaluate an AI agent by measuring it against an outcome agreed up front — task success, tool-call fidelity, cost, and latency — on real data, repeatably. An agent should ship measured, not asserted.
Haroon Latif · 12 March 2026