Your chatbot passes QA today. Will it tomorrow? Agent Canary runs continuous test conversations against your AI agents and alerts you the moment quality degrades — before your users notice.
$ python canary.py check --target "Support Bot"
✓ "What are your hours?"
Score: 0.94 | Latency: 342ms
✓ "How do I reset my password?"
Score: 0.91 | Latency: 287ms
⚠ "What's your refund policy?"
Score: 0.68 | Latency: 1204ms
↓ Regression detected — score dropped 0.23 from baseline
───────────────────────────────────
Target: Support Bot | Avg Score: 0.84 | Alerts: 1 warning
Next check in 5 minutes...
A model update, a prompt change, a knowledge base edit — and suddenly your chatbot gives wrong answers. You find out when customers complain. That's too late.
Provider ships a new version. Your carefully tuned prompts now produce different outputs.
Someone on the team tweaks the system prompt. Quality drops in ways nobody catches.
Your RAG pipeline indexes stale docs. The bot starts confidently quoting outdated info.
Response times creep up. Users abandon conversations. You don't notice for days.
Set up canary queries. Get alerts. Fix problems before users see them.
AI-powered similarity scoring compares responses against expected answers. Not string matching — actual meaning.
Moving average baselines catch gradual drift and sudden drops. Know if quality is trending down before it craters.
Slack, email, webhooks — get notified wherever you work. Configure thresholds per query or per target.
Monitor response times alongside quality. Catch performance regressions that hurt user experience.
Historical dashboards show quality trends over time. Share with stakeholders. Prove your bot is improving.
Works with any chatbot that has an HTTP API. OpenAI, Anthropic, custom models, Rasa, Dialogflow — all supported.
No SDK. No code changes. Just point Agent Canary at your chatbot.
Point Agent Canary at any HTTP endpoint. Supports custom headers, auth tokens, and request formats.
canary add-target --url "https://api.example.com/chat" --name "Support Bot"
Write test questions with expected answers. These are the "canaries in the coal mine" that detect quality changes.
canary add-canary --target "Support Bot" --query "What are your hours?" --expected "We are open 9-5 Monday to Friday"
Agent Canary runs your test queries on a schedule and alerts you when quality drops below your threshold.
canary monitor --interval 5 --alert slack
Start free. Scale when you need to.
Join the early access list. We'll onboard you personally and give you Pro features free during beta.