durable execution means your agent's state checkpoints after each step. if a connection drops, a process restarts, or a tool times out, execution resumes from the last checkpoint instead of starting over.
why it matters
consider what happens during a typical agent task. a user asks for a research report. the agent searches sources, reads documents, extracts information, and starts writing. fifteen minutes in, a network blip drops the connection.
without durability: everything vanishes. the user starts over. the work is lost.
with durability: execution resumes from where it stopped. the agent continues with full context of what it was doing.
how it works
traditional agent loops run as long-running processes with state in memory. if the process dies, everything is lost.
durable execution inverts this. the runtime is event-driven, not process-driven:
- execute: the agent performs one action
- persist: the result and updated state are saved
- yield: control returns to the runtime
the next step only begins after the previous step is durably stored.
what gets checkpointed
- conversation history
- agent memory and working state
- completed tool call results
- current position in multi-step plans
- any state you explicitly store
automatic, not opt-in
on inference.sh, durable execution is how the runtime works. there's nothing to enable. every agent automatically gets:
- checkpointing after each step
- transparent resume on any failure
- state persistence across process boundaries
- full execution history for debugging