Chapter 5 of 25
What is an LLM agent architecture?
Created May 27, 2026 Updated May 27, 2026
An LLM agent is an LLM placed inside a loop that lets it pick actions, call tools, observe results, and decide what to do next — until some stopping rule fires. An agent architecture is the shape of that loop: how planning, acting, observing, remembering, and verifying are arranged.
The minimal ingredients:
- LLM as decision-maker — chooses the next action from the available tools and the current state.
- Tool interface — the things the agent can call (search, code execution, retrieval, APIs, other agents).
- Memory / state — what the loop carries forward between steps.
- Stopping rule — when the loop ends.
The central design choice is the level of autonomy. More autonomous architectures (open-ended ReAct loops with broad tool access, Tree-of-Thought search, multi-agent systems) can solve harder problems but fail in more ways. More constrained ones (fixed plans, short step budgets, narrow tool sets) are easier to debug, evaluate, and rate-limit — but cap at simpler tasks. Every decision below picks a point on that axis.
Architecture isn't one decision — it's several layered choices. These are not all the same kind of thing:
- Control-flow patterns — ReAct (interleaved reasoning + acting), Plan-and-Execute (plan first, run later), Tree-of-Thought (branch + evaluate), simple chain / linear workflow.
- Tool / action interfaces — function calling (typed JSON schemas, the de facto standard), CodeAct (LLM emits code to execute), MCP (Model Context Protocol — portable tool registry), human-in-the-loop checkpoints.
- Verification loops — Reflection, Self-Refine, critic agents that re-read the agent's own output before continuing.
- Decomposition strategy — single agent vs multi-agent with role specialization (planner, researcher, coder, critic).
ReAct is a control-flow pattern. Function calling is a tool-invocation mechanism. CodeAct is an action surface. Reflection is a verification loop. Multi-agent is a decomposition strategy. They're orthogonal — a real production agent typically picks one from each layer.
What you actually wire up to build one — beyond the design patterns, the engineering layer that decides whether an agent ships:
- Tool interface. Clean schemas, input validation, explicit error contracts. Each tool a clear
(input, output, errors)triple. MCP is the emerging standard for exposing tools to agents in a portable way. - State and memory. Short-term scratchpad (within one run) versus long-term persistence (across sessions — usually a vector DB plus a doc store). What's kept, what's deliberately forgotten between turns.
- Stopping rules and budgets. Step limit, token budget, latency cap, cost cap. The most common production failure isn't a bad plan — it's an agent that never stops.
- Observability. Every step logged with inputs, outputs, tool calls, latency, cost. Agent failures only surface across many runs. LangSmith, LangFuse, Arize Phoenix, OpenTelemetry-based stacks all target this.
- Evaluation harness. Per-trajectory metrics — success rate, cost-per-task, hallucinated-tool-call rate — not per-step. Building the eval harness is typically half the work.
- Framework choice. OpenAI Responses API / Agents SDK (the older Assistants API is being deprecated in 2026), Anthropic's Claude Agent SDK, LangGraph (graph of steps), AutoGen (multi-agent conversation), CrewAI (role-based teams). Each picks defaults for the items above; the right framework is usually the one whose defaults match your constraints rather than the one with the most features.
Full breakdown — implementation patterns for each architecture, failure modes, when to choose which, plus the practical engineering layer (memory stores, tool registries, evaluation harnesses): see LLM Agent Architectures.