Chapter 5 of 25

What is an LLM agent architecture?

Created May 27, 2026 Updated May 27, 2026

An LLM agent is an LLM placed inside a loop that lets it pick actions, call tools, observe results, and decide what to do next — until some stopping rule fires. An agent architecture is the shape of that loop: how planning, acting, observing, remembering, and verifying are arranged.

The minimal ingredients:

LLM as decision-maker — chooses the next action from the available tools and the current state.
Tool interface — the things the agent can call (search, code execution, retrieval, APIs, other agents).
Memory / state — what the loop carries forward between steps.
Stopping rule — when the loop ends.

The central design choice is the level of autonomy. More autonomous architectures (open-ended ReAct loops with broad tool access, Tree-of-Thought search, multi-agent systems) can solve harder problems but fail in more ways. More constrained ones (fixed plans, short step budgets, narrow tool sets) are easier to debug, evaluate, and rate-limit — but cap at simpler tasks. Every decision below picks a point on that axis.

Architecture isn't one decision — it's several layered choices. These are not all the same kind of thing:

Control-flow patterns — ReAct (interleaved reasoning + acting), Plan-and-Execute (plan first, run later), Tree-of-Thought (branch + evaluate), simple chain / linear workflow.
Tool / action interfaces — function calling (typed JSON schemas, the de facto standard), CodeAct (LLM emits code to execute), MCP (Model Context Protocol — portable tool registry), human-in-the-loop checkpoints.
Verification loops — Reflection, Self-Refine, critic agents that re-read the agent's own output before continuing.
Decomposition strategy — single agent vs multi-agent with role specialization (planner, researcher, coder, critic).

ReAct is a control-flow pattern. Function calling is a tool-invocation mechanism. CodeAct is an action surface. Reflection is a verification loop. Multi-agent is a decomposition strategy. They're orthogonal — a real production agent typically picks one from each layer.

What you actually wire up to build one — beyond the design patterns, the engineering layer that decides whether an agent ships:

Tool interface. Clean schemas, input validation, explicit error contracts. Each tool a clear (input, output, errors) triple. MCP is the emerging standard for exposing tools to agents in a portable way.
State and memory. Short-term scratchpad (within one run) versus long-term persistence (across sessions — usually a vector DB plus a doc store). What's kept, what's deliberately forgotten between turns.
Stopping rules and budgets. Step limit, token budget, latency cap, cost cap. The most common production failure isn't a bad plan — it's an agent that never stops.
Observability. Every step logged with inputs, outputs, tool calls, latency, cost. Agent failures only surface across many runs. LangSmith, LangFuse, Arize Phoenix, OpenTelemetry-based stacks all target this.
Evaluation harness. Per-trajectory metrics — success rate, cost-per-task, hallucinated-tool-call rate — not per-step. Building the eval harness is typically half the work.
Framework choice. OpenAI Responses API / Agents SDK (the older Assistants API is being deprecated in 2026), Anthropic's Claude Agent SDK, LangGraph (graph of steps), AutoGen (multi-agent conversation), CrewAI (role-based teams). Each picks defaults for the items above; the right framework is usually the one whose defaults match your constraints rather than the one with the most features.

Full breakdown — implementation patterns for each architecture, failure modes, when to choose which, plus the practical engineering layer (memory stores, tool registries, evaluation harnesses): see LLM Agent Architectures.