Robin van Veen - AI Automation
Posts
The 5 patterns that kill AI agents in production

The 5 patterns that kill AI agents in production

After building 100+ agents, the same 5 things break every time.

Robin van Veen
May 31, 2026

Last month I got a call from a founder who had spent ~€18k on an AI agent that was supposed to handle inbound support.

Three months in, his team had stopped trusting it. Replies went out with the wrong client name. Refunds got approved that shouldn’t have.

Nobody knew which tickets it had actually closed and which it had silently dropped.

He thought he had a model problem. He had an architecture problem.

This is the part nobody on LinkedIn talks about: most AI agents don’t fail because GPT or Claude isn’t smart enough.

They fail because nobody designed them to survive a bad day.

I’ve built more than 100 agents over the last two years. The ones still running six months in have almost nothing in common with the ones that got switched off. Not the model. Not the framework. Not even the use case.

What separates them is whether someone designed them to fail safely.

Here are the 5 patterns I see every single time an agent dies in production. If you’re running an agent right now, or you’re about to sign off on one, read this twice.

1. /no-fallback

The API times out. The model hallucinates. The agent just stops.

No retry. No alert. No recovery path. The task sits in limbo while your team thinks it’s handled. By the time someone notices, you’ve lost a day of invoices, qualified leads, or first replies to a hot prospect.

A real agent has a plan for what happens when the happy path breaks. Retry with backoff. Fall back to a simpler model. Escalate to a human queue. Log the failure with enough context that someone can actually fix it.

If your agent has one route through the task, you don’t have an agent. You have a demo.

2. /no-sop

Ask your team how the process actually runs today. If the answer is “ask Sarah” or “we just know”, your agent is going to guess every step.

I see this constantly. A company hires us to automate sales onboarding, and when we ask for the SOP, there isn’t one. The process lives in three people’s heads, and each of them does it slightly differently. So whatever the agent learns is some weighted average of inconsistent humans, and the output is exactly as unpredictable as you’d expect.

The fix is upstream of the AI. Document the process first. Automate second. An agent on top of a vague SOP is just a faster way to be wrong, at scale, in writing, with your logo on it.

This is also why we won’t take projects where the client refuses to document. It’s not gatekeeping. It’s the difference between an agent that compounds value and one that becomes a six month write-off.

3. /no-logging

A real example from a prospect’s dashboard: 142 tasks completed last week. 23 failed. Root cause known: 0.

Logs disabled. Traces missing. Debugging impossible.

You cannot improve what you cannot see. Worse, you cannot defend it. When a client emails asking why their refund didn’t go through, “I think the agent handled it” is not an acceptable answer. Every production agent needs a paper trail. Inputs, outputs, decisions, retries, failures. Searchable. Auditable. Owned.

This is not optional. It’s the single highest ROI piece of architecture you can add, because it’s the thing that lets you turn every failure into a fix instead of a mystery.

4. /no-human-loop

Refund $4,200. Approved, no review.

Delete account. Approved, no review.

Change pricing. Approved, no review.

The agents that put companies in real trouble are the ones where nobody set a checkpoint on the calls that need human judgment. Speed without oversight is one bad run away from a very public, very expensive mistake.

The good news: human-in-the-loop is cheap to add and almost free to maintain. You pick the actions that need a second pair of eyes (anything irreversible, anything financial above a threshold, anything that touches a customer relationship), and you route them through Slack, email, or a simple approval queue. The agent does 95% of the work. Your team approves the 5% that matters.

The trade-off is not speed versus safety. It’s speed versus blowing up the trust you built with your customers.

5. /no-guardrails

Data access: everything. API scope: unlimited. Output validation: none. Rate limits: none.

One bad prompt away from a very expensive mistake.

The agents that survive in production have a small, sharp scope. They can only touch what they need to touch. Their inputs are validated. Their outputs are checked against a schema. Their permissions are scoped to a single workflow. When they get asked for more, they fail loudly instead of guessing.

This is the same principle every senior engineer applies to a junior dev’s pull request. You don’t give the new hire root access on day one. Don’t give it to your agent either.

The pattern behind the patterns

The agents that actually keep producing revenue six months later share one thing.

They’re built to fail safely, not just to run fast.

Fallbacks. Documented SOPs. Logs. Human checkpoints. Scope limits.

None of it is glamorous. None of it gets you a viral LinkedIn post. But it’s the reason a client’s agent is still closing tickets, qualifying leads, and pushing invoices through while three other vendors got quietly replaced by an intern with a Notion doc.

If you’re going to spend money on an AI agent in 2026, spend it on the boring half. The smart half is already commoditized.

What to do this week

Pick your most important agent. Ask five questions:

What happens when its main API fails?
Where is the SOP it follows, in writing?
Can I see every task it ran in the last 7 days, with inputs and outputs?
Which of its actions cannot be reversed without a human approving first?
What’s the smallest possible scope it could have and still do its job?

If you can’t answer all five with a clear “yes, here’s where”, you’ve found this month’s project.

Reply with the one you’re missing right now. I’ll tell you exactly where I’d start.

Robin

P.S. The full visual breakdown of all 5 is the carousel I posted on LinkedIn this week. Worth a scroll before your next sprint planning: Click here