Treat agent errors as operational state: inspect failures with agent-error, decide whether to restart, and avoid hiding exceptions inside asynchronous state updates.
Agent failures are easy to miss if you think of agents as fire-and-forget task runners. An agent action can throw, and that failure changes how later actions behave.
The default posture should be explicit: decide what failure means for the state value, observe it, and either restart the agent or route the work to a more appropriate failure-handling system.
| Function | Use |
|---|---|
agent-error |
Inspect the exception associated with a failed agent. |
restart-agent |
Provide a new state and allow processing to continue. |
set-error-handler! |
Attach observation logic for failures. |
set-error-mode! |
Choose whether errors fail the agent or allow continuation. |
Do not let agent failures become invisible background noise. If an agent owns business-relevant state, its errors need metrics, logs, and a recovery decision.
1(def attempts (agent {:ok 0}))
2
3(send attempts
4 (fn [_]
5 (throw (ex-info "bad action" {}))))
6
7(await attempts)
8(some? (agent-error attempts))
9;; => true
After a failure, inspect the error before deciding what state is safe.
1(when-let [err (agent-error attempts)]
2 (println "agent failed:" (.getMessage err)))
3
4(restart-agent attempts {:ok 0})
Restarting is not the same as fixing the root cause. It is a deliberate state reset.
Sometimes an action can handle an expected error and return a valid next state.
1(defn record-result [state run!]
2 (try
3 (run!)
4 (update state :ok inc)
5 (catch Exception ex
6 (update state :errors
7 (fnil conj [])
8 (.getMessage ex)))))
Use this only when the action really knows how to produce a safe next state. If the state may be corrupted or incomplete, failing the agent is better than pretending success.
| Question | Why it matters |
|---|---|
| What happens when this action throws? | Async failure is easy to miss. |
| Is the next state still valid after a caught exception? | Returning bad state is worse than a visible failure. |
Who observes agent-error? |
Production systems need operational visibility. |
| Is restart safe? | Some state should be rebuilt from durable truth instead. |