Beyond Cron: How Temporal and Durable Execution Are Powering Reliable AI Pipelines

There’s a decent chance you have a cron job somewhere that looks like this:

0 3 * * * /usr/local/bin/run_ai_pipeline.sh >> /var/log/ai.log 2>&1

It worked fine when “AI pipeline” meant “run one script that calls an API and uploads a file.”

But now that pipeline is a small universe:

Fetch data from a dozen sources
Chunk and embed documents
Fan out queries to an LLM (with rate limits and timeouts)
Call tools, write results to databases, send notifications
Maybe involve a human who approves or edits output

…and somewhere around step 7, the network flakes, your container restarts, or OpenAI returns a 500.

Cron has no idea.

It doesn’t know which step you were on. It doesn’t know what already succeeded. It doesn’t know that you just re-ran the same expensive LLM call three times.

As AI workloads get more complex, that “best effort” model stops being cute and starts burning money, time, and trust. This is exactly the gap Temporal’s Durable Execution model is built to fill (temporal.io)—and why it’s quietly becoming the runtime backbone for LLM pipelines and agent orchestration.

This post is about going beyond cron: what durable execution really is, how Temporal works under the hood, and how it changes the way you design AI systems.

1. Cron is great… until you add LLMs

Cron is beautifully dumb.

It answers exactly one question: “When should I run this command?” Everything else—state, errors, retries, partial progress, rate limits—is your problem.

A lot of AI architectures today still look like this:

Cron fires a script.
Script pulls “jobs” from a queue.
For each job:
- Call an LLM.
- Call some tools.
- Save stuff in a DB.
Hope nothing explodes mid-way.

Here’s what goes wrong once LLMs enter the picture:

LLM calls are expensive — If you crash after 20 minutes of multi-step reasoning, you don’t want to “just start over.”
Pipelines are multi-step and stateful — You can’t just say “this job failed”; you need to know which steps completed.
Failures are normal, not exceptional — Rate limiting, flaky APIs, transient DB issues. You need retries with semantics, not a for loop with sleep(5).
Pipelines can run for hours or days — Waiting for a human-in-the-loop, backfills over millions of documents, long research tasks.
Agents are loops, not DAGs — An agent might decide its next tool at runtime. That’s not a simple “run step 3 after step 2” anymore.

You can try to solve all of this with hand-rolled state machines, idempotency keys, and scattered checkpointing. Many teams do. And then they slowly reinvent a workflow engine.

Temporal’s pitch is basically: let’s give you that engine as a programming model, instead of as a tangle of queues and custom glue. (temporal.io)

2. Durable Execution in one sentence

Let’s define the core idea:

Durable Execution means your function can outlive the process that runs it—and still behave as if it ran once, in one place, without losing its mind.

Temporal’s docs describe a Workflow Execution as a “durable, reliable, and scalable function execution,” and treat it as the main unit of execution in an application. (Temporal)

What does that really mean?

Normally, when you call a function:

Its local variables live on a stack in memory.
If the process crashes, poof—that stack is gone.
To recover, you need external state: DB rows, logs, maybe manual intervention.

With durable execution:

The runtime records a history of events: “Timer started”, “Activity completed with result X”, “Signal received with payload Y.” (Temporal)
Your workflow function is written so its behavior is fully determined by that history.
On failure, Temporal replays the history into your workflow code, re-creating its state (locals, progress) and continuing as if nothing happened.

Think of it like a game that stores not just a saved screenshot, but every input you ever pressed. Replaying the inputs recreates the exact game state.

This is crucial for AI pipelines:

The decision “Do we need another research iteration?” is part of the workflow’s logic and is replayable.
The actual LLM call is a side effect whose result is recorded in history and never re-executed on replay.

3. Temporal in three moving parts

Temporal gives you durable execution as a service. Conceptually, it has three roles:

Temporal Service (cluster)
- Stores workflow histories durably.
- Generates tasks (e.g., “run this workflow code”, “run this activity”).
- Manages timers, retries, and task queues. (temporal.io)
Workers (your code)
- Regular processes you run (Docker, Kubernetes, whatever).
- Host your Workflows (orchestration logic) and Activities (side-effecting work).
- Poll Temporal for tasks and push results back. (Temporal)
Clients (also your code)
- Start workflows, send signals, query status, etc. (Temporal)

The key abstraction is:

Workflows = deterministic orchestration logic (no direct HTTP calls, no random Date.now()).
Activities = anything that can fail or be non-deterministic: HTTP calls, DB writes, LLM calls. (Temporal)

When a workflow schedules an activity (“run this LLM call”), Temporal:

Writes an event “Activity A scheduled” into history.
Enqueues an activity task to a Task Queue.
Some worker picks it up, runs your activity code (e.g., an OpenAI call), and returns the result.
Temporal writes “Activity A completed with result R” into history.

If the worker dies halfway through?

Temporal will deliver the activity task to another worker (with retries governed by your policy).
If the Temporal cluster itself restarts, history is persisted in its backing store. (Temporal Assets)

From your workflow’s point of view, you await an activity and either get a result or a failure—no manual polling, no custom retry code.

4. Deterministic workflows vs stochastic models

At first glance, this sounds incompatible with LLMs.

LLMs are stochastic: you call them twice with the same prompt and parameters, and you might get different text. That’s basically the opposite of determinism.

Temporal solves this tension by drawing a hard line:

Workflows must be deterministic.
Activities are allowed to be non-deterministic; their results are recorded. (Temporal)

On the first run:

Your workflow calls an activity like runLLMAnalysis(prompt).
The activity hits the OpenAI API and returns a string.
Temporal stores that output in workflow history.

On replay:

The workflow does not call the API again.
Temporal “replays” the history, feeding the recorded output back into the workflow at the same point.

So as long as your workflow logic only uses:

Its input arguments,
Recorded activity results,
Deterministic APIs provided by the Temporal SDK (e.g., workflow-specific now(), sleep()),

…it will follow the exact same control flow on replay. (Temporal)

Meanwhile, your LLMs stay non-deterministic—but only on the first execution of each step.

This gives you a nice mental rule:

Workflows own decisions. Activities own side effects.

For AI pipelines, that means:

The workflow decides when to call the model, which tools to use, and when to stop iterating.
Each LLM call is a durable step whose result is saved, so a deploy or crash doesn’t silently redo work.

5. A durable AI pipeline in Temporal (TypeScript example)

Let’s build a simplified Temporal-powered AI pipeline in TypeScript:

Goal: Given a jobId, run a research pipeline:

Fetch documents.
Generate embeddings and store them.
Ask an LLM to produce an analysis.
Store the final result.

We’ll show:

Activities that talk to the outside world (HTTP, DB, LLMs).
A workflow that orchestrates these activities reliably.

5.1. Activities: all the messy parts

// src/activities.ts
// These run in a normal Node.js context.

export interface SourceDoc {
    id: string;
    content: string;
}

export interface EmbeddingResult {
    docId: string;
    vector: number[];
}

export interface AnalysisResult {
    jobId: string;
    summary: string;
    reasoning: string;
}

export async function fetchDocuments(jobId: string): Promise<SourceDoc[]> {
    // Call your own APIs/DBs/cloud storage here.
    // Any network failure will be retried according to Activity retry policy.
    console.log(`[activities] Fetching docs for job ${jobId}`);
    // Placeholder: in real life you'd pull from S3, Postgres, etc.
    return [
        { id: "doc-1", content: "First document text..." },
        { id: "doc-2", content: "Second document text..." },
    ];
}

export async function embedDocuments(
    docs: SourceDoc[]
): Promise<EmbeddingResult[]> {
    console.log(`[activities] Embedding ${docs.length} docs`);
    // You might batch into your vector DB here, or call an embeddings API.
    // This is intentionally non-deterministic (remote API call).
    return docs.map((d) => ({
        docId: d.id,
        vector: [Math.random(), Math.random(), Math.random()], // placeholder!
    }));
}

export async function runLLMAnalysis(
    jobId: string,
    docs: SourceDoc[]
): Promise<AnalysisResult> {
    console.log(`[activities] Running LLM analysis for job ${jobId}`);
    // Call your favorite LLM here.
    // For example, using OpenAI's SDK (pseudo-code):
    //
    // const response = await openai.chat.completions.create({
    //   model: 'gpt-4.1',
    //   messages: [...],
    // });
    //
    // return { ...based on response... };

    return {
        jobId,
        summary: "Fake summary from LLM.",
        reasoning: "Fake chain-of-thought or tool usage (not shown to users).",
    };
}

export async function storeResults(result: AnalysisResult): Promise<void> {
    console.log(`[activities] Storing result for job ${result.jobId}`);
    // Write to DB, send notifications, etc.
}

5.2. Workflow: the reliable conductor

Temporal workflows run in a special isolated environment for determinism; you can’t just import the OpenAI SDK and start calling it. Instead, you proxy activities and orchestrate them:

// src/workflows.ts
import { proxyActivities } from "@temporalio/workflow";
import type * as activities from "./activities";

export interface PipelineInput {
    jobId: string;
}

export interface PipelineOutput {
    jobId: string;
    docCount: number;
}

const { fetchDocuments, embedDocuments, runLLMAnalysis, storeResults } =
    proxyActivities<typeof activities>({
        startToCloseTimeout: "10 minutes",
        retry: {
            maximumAttempts: 5,
            backoffCoefficient: 2.0,
        },
    });

export async function aiPipelineWorkflow(
    input: PipelineInput
): Promise<PipelineOutput> {
    const { jobId } = input;

    // Step 1: Fetch documents (Activity).
    const docs = await fetchDocuments(jobId);

    // Step 2: Embed them (Activity; might call vector DB).
    const embeddings = await embedDocuments(docs);
    // You might store embeddings inside embedDocuments, or in another Activity.

    // Step 3: Run LLM analysis (Activity).
    const analysis = await runLLMAnalysis(jobId, docs);

    // Step 4: Store final results (Activity).
    await storeResults(analysis);

    // Step 5: Return minimal workflow result.
    return {
        jobId,
        docCount: docs.length,
    };
}

There are a few big wins hiding in this simple code:

If the process running this workflow crashes after embedDocuments succeeds but before storeResults, Temporal will replay history, re-create the workflow’s state, and continue from the await storeResults(...) line.
The LLM call happens once; its output is part of the history. On replay, no extra tokens are burned.
Retries, timeouts, and backoff are declarative in the workflow when you create the activity proxies.

5.3. Worker: connecting your code to Temporal

Finally, a worker binds workflows and activities to a task queue:

// src/worker.ts
import { Worker } from "@temporalio/worker";
import * as workflowModule from "./workflows";
import * as activities from "./activities";

async function run() {
    const worker = await Worker.create({
        // Temporal uses this to load your workflow code in an isolated runtime.
        workflowsPath: require.resolve("./workflows"),
        activities,
        taskQueue: "ai-pipeline", // name used by clients to start workflows
    });

    await worker.run(); // blocks until process is stopped
}

run().catch((err) => {
    console.error(err);
    process.exit(1);
});

Temporal’s TypeScript docs and tutorials walk through this full setup in detail—creating a project, running a dev server with temporal server start-dev, and starting workers connected to a task queue. (Learn Temporal)

6. Why durable execution is such a good fit for AI pipelines

AI pipelines are textbook use cases for durable workflows. If you list out the pain points, Temporal maps to them almost directly:

6.1. Expensive steps

LLM calls, vector indexing, and large data movement are costly. You want:

At-most-once semantics for external side effects (as seen by your business logic).
The ability to retry failed steps without losing already-completed work.

Temporal achieves this by:

Treating activities as individually retriable steps with configurable retry policies.
Persisting activity results so they’re not recomputed on replay. (Temporal Assets)

6.2. Long-running jobs

It’s perfectly normal for:

An agent to work for hours.
A data backfill to run for days.
A workflow to wait for a human approval for weeks.

In Temporal, timers and sleep calls are server-side, not process-local:

workflow.sleep('3 days') stores a timer in Temporal’s backend, not in memory.
Your workers can restart, deploy, scale horizontally—all without losing this “sleep.” (Temporal Assets)

6.3. Human-in-the-loop and tools

AI apps often need:

“Ask a user to confirm this action.”
“Wait for a human editor to approve the draft.”
“Call tool X, then Y, unless user cancels.”

Temporal’s Signals/Queries and event history model fit this nicely:

A workflow can sleep indefinitely waiting for a signal (e.g., “user_approved”).
When the signal arrives, it becomes an event in history and the workflow resumes. (Temporal)

6.4. Observability and debugging

Debugging an agent that calls multiple tools and models is… not fun.

Temporal gives you:

A full event history: which activities ran, their inputs, outputs, retries. (Temporal)
A Web UI where you can see workflow status, task queues, and timing.
The ability to replay workflows locally for debugging.

Instead of diffing random log lines, you can inspect a structured timeline of your AI pipeline.

6.5. Evolution and versioning

AI stacks evolve constantly:

You change the system prompt.
You add a new step to the pipeline.
You swap one model for another.

Temporal has explicit guidance and features for workflow versioning—rolling out new behavior while allowing old workflows to finish with their original logic. (Temporal)

That’s much saner than trying to keep a pile of “v2”, “v3”, “v3_final” scripts straight in cron.

7. Agents, loops, and unknown control flow

Cron is fundamentally about known schedules. Traditional DAG orchestrators (Airflow, etc.) are about known graphs.

But AI agents often look more like this:

while (!goalReached) {
    const plan = await llm.plan(currentState);
    const toolResult = await runTool(plan.nextAction);
    currentState = await llm.summarize({ currentState, toolResult });
}

The number of loop iterations is unknown ahead of time. The sequence of tools is decided at runtime.

Temporal is surprisingly good at this, because workflows are just code:

You can implement loops, dynamic branches, and recursion directly.
Each loop iteration can:
- Call one or more LLMs (as activities),
- Call tools,
- Decide what to do next based on the recorded results.

Temporal and the broader ecosystem have started leaning into this pattern:

Temporal’s own blog discusses how “AI apps and agents are distributed systems on steroids,” and why durable execution is a perfect fit for them. (temporal.io)
Integrations with frameworks like the OpenAI Agents SDK and Pydantic AI wrap agents with Temporal workflows so they can survive failures, restarts, and long-running interactions. (temporal.io)

From a mental-model standpoint:

Agents choose the next step; Temporal guarantees each step actually happens (once) and is remembered.

8. Beyond Cron: how the mental model changes

Let’s contrast how you think about an AI pipeline with Cron/queues vs Temporal.

Concern	Cron + scripts/queues	Temporal Durable Workflows
Scheduling	OS-level cron, stateless	Temporal Schedules / external triggers
Progress tracking	Ad-hoc DB flags, logs	Workflow event history
Retries	Hand-written loops, backoff, idempotency everywhere	Declarative retry policies per Activity
Long-running waits	`sleep` in process, hacked heartbeats	Server-side timers, `workflow.sleep`
Crashes & deploys	You restart jobs manually or accept partial failures	Workflows resume from last persisted event
Observability	Grep logs	Web UI, history, replay
Non-deterministic work (LLMs)	Called wherever; risk of duplicate calls	Isolated in Activities with durable results
Agents / dynamic loops	Custom state machine code	Native control flow in workflows + durable execution

Cron doesn’t become useless—you might still use it to kick off a new workflow every night for a reporting job—but it’s no longer the source of truth for your system’s behavior.

Instead, you design your AI system as a set of long-lived, reliable functions (workflows) that orchestrate side-effecting operations (activities).

9. Getting started: a practical path

If this is all new, it can sound like a huge rewrite. It doesn’t have to be.

Here’s a pragmatic on-ramp:

Run Temporal locally
- Install the CLI and start a dev server (one command: temporal server start-dev). (Learn Temporal)
Wrap one part of your pipeline
- Pick a reliability-critical job (e.g., “process nightly docs”).
- Turn its orchestration into a Temporal workflow.
- Keep your existing code as activities—HTTP calls, DB writes, LLM calls, etc.
Use Temporal as your “AI cron”
- Either:
  - Keep your old cron, but have it start a Temporal workflow instead of running the whole pipeline, or
  - Use Temporal’s native schedules/cron support to trigger workflows on a cadence.
Lean into workflow semantics
- Gradually move more logic into workflows:
  - Human approvals via signals.
  - Complex retry policies.
  - Sub-workflows for per-document processing.
Try an agent pattern
- Take a simple agent loop and adapt it into a workflow: the planning and decision logic stays in the workflow, while model/tool calls are activities.
- Or experiment with an existing integration (e.g., Temporal + OpenAI Agents or Pydantic AI). (temporal.io)

Within a couple of iterations, you’ll notice a shift: you’re no longer asking “Did the cron job run?” but “What’s the state of that workflow?”—and Temporal can answer that precisely, even if your infrastructure has been restarted a dozen times in between.

10. Key takeaways & further reading

Let’s recap the big ideas:

Cron is about time, not correctness. It’s fine for simple tasks, but it has no concept of partial progress, retries, or long-running stateful workflows.
Durable Execution turns “a function call” into “a durable, replayable execution” that can survive crashes, deploys, and network failures while behaving as if it ran once, in one place. (Temporal)
Temporal gives you durable execution as a programming model, with:
- Deterministic workflows for orchestration,
- Activities for side effects and non-deterministic operations,
- Event histories, retries, and timers built in. (Temporal)
AI pipelines and agents map almost perfectly onto workflows, because they are:
- Multi-step, long-running, and failure-prone,
- Expensive to recompute,
- Often interactive and tool-driven. (temporal.io)
You don’t have to rewrite everything at once. Start by wrapping your existing pipeline in a single workflow and grow from there.

If you want to go deeper, good next reads include:

Temporal docs on Workflows, Activities, and event history for a deeper look at determinism and replay. (Temporal)
Temporal’s “Durable Execution meets AI” and “Durable AI agent” resources for AI-specific patterns and examples. (temporal.io)
The TypeScript SDK tutorials if you’re a Node/TypeScript shop and want to get hands-on quickly. (Learn Temporal)

The punchline: as AI systems evolve from “one-off API call” to “always-on, multi-step, agentic workflows,” the old cron-plus-scripts architecture creaks under the weight.

Temporal’s durable execution model gives you something closer to a language runtime for workflows—one where time, failures, and long-lived state are first-class concerns. For reliable AI in production, that’s starting to look less like an option and more like table stakes.