⚖️ Agentic Evaluator
LiveDesign structured eval cases for AI agents. Define scenarios, expected steps, and assertions — or let the Copilot walk you through it.
What is an eval?
An eval tests whether an AI agent does the right thing. You give it a scenario (the setup), a user_message (the trigger), expected_steps (what a good agent should do), and assertions (pass/fail checks on the output). Think of it as a unit test, but for agent behavior instead of code.
See an eval in action
Watch how an eval catches an agent hallucinating a delivery timeline during a customer support sequence, then design your own below.
✈️ Airline Support Agent— A customer support bot with access to lookup_order and refund_order tools
User says:
“I need to cancel my flight and get a refund. Order #4821.”
