Open any "agent patterns" explainer and you get the same bulleted trio: ReAct, Plan-and-Execute, Reflexion. Pick one, the framing implies, the way you'd pick a sorting algorithm. It's a tidy listicle and it quietly misleads, because these three don't sit on the same shelf. Two of them are opposite ends of a single dial. The third is measuring something else entirely.
The dial is commitment: how much does the agent decide before it looks at the world again?
ReAct: decide one step at a time
ReAct (Yao et al., 2022) interleaves reasoning and acting in a loop — a Thought, then an Action, then an Observation from the environment, then the next Thought conditioned on what just came back. The reasoning trace lets the model track and update its plan and handle exceptions; the actions let it pull in outside information. Crucially, it never commits to a multi-step plan. It re-decides at every step.
That makes ReAct robust to surprise: if a tool returns garbage or a page isn't what the model expected, the next thought sees it and adapts. The cost is right there in the loop. Every step is an LLM call, the full thought/action/observation history keeps growing in the context window, and agents that re-decide forever can drift off-task or fall into repetition loops — the classic "it tried the same failing search four times" failure.
Plan-and-Execute: commit, then run
The other end of the dial decides the whole plan first. The lineage runs through Plan-and-Solve prompting (Wang et al., 2023), which split zero-shot reasoning into "devise a plan to divide the task into subtasks, then carry out the subtasks" — explicitly to cut the missing-step errors that plain chain-of-thought made. The agentic version (popularized in LangChain's "Plan-and-Execute" template) does the same at the tool level: a planner emits the full list of steps up front, then an executor runs them.
Deciding up front buys you things ReAct can't. You make far fewer planner calls. And once the steps are laid out, independent steps can run in parallel — which is exactly what LLMCompiler (Kim et al., 2023) industrializes, having its planner emit a DAG of tasks with dependencies so the runtime fans out parallel function calls instead of waiting on each in turn. ReWOO (Xu et al., 2023) pushes the same instinct further, decoupling reasoning from observations so the planner doesn't pay tokens to re-read every intermediate result.
The axis they share
Here's the catch the listicle hides. A plan made up front assumes the world at execution time matches your model of it at planning time. The moment reality deviates — an API errors, a result contradicts an assumption — a static plan executes confidently into the wall. So every production plan-and-execute system bolts on a re-plan step: execute a batch, observe, revise the plan, repeat.
Re-planning is just ReAct with a coarser clock. Shrink the batch toward one step and plan-and-execute becomes ReAct; grow ReAct's lookahead and it becomes plan-and-execute. They're the same dial.
That reframes the choice. You aren't picking a pattern; you're choosing how often to re-decide, and the tuning knob is your environment's surprise rate against your cost budget. Stable, decomposable, latency-sensitive work leans toward up-front planning. High-surprise environments lean toward step-at-a-time. This is also the moment teams reach for more orchestration than a single loop provides — the same calculus that governs going multi-agent vs single-agent: add structure only where the environment forces it.
Reflexion is on a different axis
Reflexion (Shinn et al., 2023) is filed next to the other two, but it isn't a planning style at all. It operates across attempts. After a trial, the agent gets a feedback signal — scalar or free-form — verbally reflects on what went wrong, and writes that reflection into an episodic memory buffer. On the next attempt, the reflection conditions its behavior. The paper calls this "verbal reinforcement learning": it reinforces the agent through language, with no weight updates.
It is genuinely effective when the conditions hold — the authors report 91% pass@1 on HumanEval, against 80% for the GPT-4 baseline they compare to. But read the prerequisite: Reflexion needs retries and a signal worth reflecting on. HumanEval has unit tests; the agent knows objectively when it failed. Strip away that verifiable success signal and reflection just narrates plausible-sounding stories into memory, and you've added latency and tokens to learn nothing. Reflexion wraps a base loop (often a ReAct loop) — it's a layer, not a substitute.
The decision
compare: Pattern | ReAct | Plan-and-Execute | Reflexion ;; Core loop | Thought → Action → Observation, repeated one step at a time | Plan all steps up front, then execute (re-plan on deviation) | Run a trial, reflect on the outcome in words, store it, retry ;; Commits to a plan? | No — re-decides every step | Yes — full plan up front (production adds re-planning) | N/A — wraps a base loop across attempts ;; LLM-call cost | High — one planner call per step | Lower — few planner calls, steps can run in parallel | Adds a reflection call per failed trial, on top of the base loop ;; Failure mode | Drift, looping, context bloat from full history | Stale plan when reality deviates from the plan-time model | Needs a real signal; reflects into noise if the reward is bad or absent ;; Needs a reward signal? | No | No | Yes ;; Best when | High-surprise environments, cost is secondary to robustness | Stable, decomposable tasks; latency/cost matter; steps parallelize | You can retry and you have a verifiable success signal to learn from
So stop asking "which of the three." Ask two questions. How stable is my environment? That sets your point on the ReAct↔Plan-and-Execute commitment dial — surprise pushes you toward re-deciding often, stability toward planning once and parallelizing. Do I get a verifiable success signal, and can I retry? If yes, add Reflexion (or any reflect-retry loop) on top of whatever base loop you chose. If no, skip it — reflection without a real signal is theater. The three patterns were never interchangeable. Two are a dial; one is a layer.



