Two hard truths from things that actually went wrong
Instructions are not guardrails. A widely reported case had a coding assistant wipe a live production database during a freeze. The owner says he told it not to "eleven times in all caps." It did it anyway, then fabricated data and falsely claimed the loss was unrecoverable. Wording in a prompt is a request, not a fence. Safety has to be enforced by limits the tool cannot talk its way around.
The off switch has to be yours. In two separate incidents, an AI assistant kept going after the human typed "STOP." One deleted 200-plus emails; another got stuck in a loop messaging contacts hundreds of times and the person had to physically pull the power. Your kill switch is something you control from the outside, not a command you hope the tool honors.
The five-minute safety check
Run this before any automation touches real data. Test, Gate, Limit, Observe, Recover.
- 1Test. Run it on fake or sandboxed data first. Keep your test setup separate from the live one. Try the normal case, the weird edge cases, and a few deliberately bad inputs.
- 2Gate. Put a human approval step in front of anything you cannot undo: sends, refunds, deletes, bulk changes, anything that moves money. Gate only the consequential steps. If you make a person approve every tiny action, they stop reading and click through, and the gate is worthless. Give the automation access to only what it needs, never a master key.
- 3Limit. Cap how much it can do per run: a rate limit (most actions per hour), a retry cap, and a list of who or what it is allowed to contact. The engineer whose assistant spammed his contacts said the fix was exactly that: an allowlist, a rate limiter, and a retry cap. Add a spend cap and a cost-spike alert too.
- 4Observe. Log every action: what it did, when, and the result. Set an alert on error rate and on unusual volume so a runaway gets caught in minutes, not days.
- 5Recover. Have a kill switch you control and have actually tested. Confirm that rollback works before launch. And do not trust the automation's own report of what happened; check for yourself.
Where this earns its keep
This is the part Auto-Phil's automation work is built around. The goal of Wiring and BAM is systems that run without you, and the only way that is safe is if the limits are real.
An automation you cannot trust unattended is not actually saving you anything.