Reasoning & Planning
Prompt Chaining
A sequence of LLM calls where each pass narrows, transforms, or verifies the output of the previous. One model, one thread — no coordination overhead.
- Each step has a single well-defined transformation (extract → classify → respond)
- Simpler than multi-agent: no state synchronisation, no handoffs
- Fail fast: if step N produces bad output, the chain halts before wasting downstream calls
- Natural fit for structured extraction, multi-step reasoning, and document pipelines
ReAct — Reason + Act
The backbone of most agents. Interleaves reasoning traces with tool calls: think, call a tool, observe the result, think again.
- Thought — internal reasoning about what to do next
- Action — call a tool or produce output
- Observation — result returned from the tool, injected into context
- Repeat until task complete or max steps reached
Plan-and-Execute
Separates planning from execution. A planner decomposes the goal upfront; an executor works through steps, optionally with its own ReAct loop.
- Planner — decomposes the goal into an ordered list of steps
- Executor — runs each step; may have an inner ReAct loop per step
- Replanner — optional: revises the plan when a step fails or context changes
- More predictable than pure ReAct; less adaptive to mid-run surprises
Tree of Thoughts
Branching + backtracking reasoning. Agent explores multiple candidate paths simultaneously, scores each branch, and prunes dead ends.
- Branch — from each state, generate N candidate next thoughts
- Score — evaluate each branch with a critic or heuristic
- Prune — discard low-scoring branches early to focus compute
- Backtrack — if a branch dead-ends, return to the last branch point
State Machine
Agent operates through explicit states and labeled transitions. Current state determines which actions are available — more structured than free-form ReAct.
- States — discrete phases with defined entry and exit conditions
- Transitions — labeled edges triggered by conditions or tool results
- Guard clauses — prevent invalid transitions from the current state
- Useful when business logic has defined phases (draft → review → deploy)
Grounding & Action
Tool Use / Function Calling
Agent operates on a typed schema of tools. The model selects a tool and fills in parameters; parameters are validated before execution; results are injected back as observations.
- Schema — each tool has a name, description, and typed parameter list
- Selection — model chooses based on current reasoning state
- Validation — parameters checked against schema before execution
- Injection — result returned to context as a structured observation
Sandboxed Execution
Code runs in an isolated environment with no access to host filesystem, network, or env vars. Output is safely captured and returned as an observation.
- Isolation — separate container, subprocess, or WASM runtime
- Resource limits — CPU, memory, and time caps prevent runaway processes
- Capture — stdout, stderr, exit code returned as structured output
- Teardown — environment destroyed after each execution
Agentic RAG
Agent steers retrieval — decides what to fetch, evaluates chunk relevance, refines its query, and iterates until the retrieved context is sufficient to generate a grounded answer.
- Query — agent generates a targeted query, not just the raw user message
- Evaluate — agent scores chunk relevance; low scores trigger re-query
- Refine — query is reformulated based on what was missing
- Iterate — repeats until context is sufficient; distinct from static RAG
Citation / Attribution
Every claim the agent makes is linked to a retrieved source. A verification pass checks citations are accurate. Ungrounded claims are flagged as uncertain.
- Source IDs — retrieved chunks tagged with origin (doc, url, chunk index)
- Inline citations — agent required to cite [n] after each factual statement
- Verification — second pass checks citations against the source material
- Essential for research agents where hallucination cost is high
Memory & Knowledge
In-Context Memory
The active prompt window — conversation history, chain-of-thought traces, and working state. Zero latency, bounded by context length, lost when the session ends.
- Conversation history — all prior turns visible in the current window
- Working set — current task state, intermediate values, scratch reasoning
- Bounded — attention degrades for tokens far from the present
- Ephemeral — lost on session end; must be persisted externally to survive
Context Distillation
When the context window fills up, the agent compresses older history into a dense summary block and continues from there — without losing the thread.
- Trigger — token count approaching the model's context limit
- Summarize — agent compresses older messages into a denser summary block
- Replace — original messages removed; summary takes their place
- Different from RAG — this manages growing in-context state, not external retrieval
Semantic Memory
Agent-curated, structured fact storage. Not passive retrieval — the agent writes and updates entries. Two models: a single profile JSON or a collection of narrow documents.
- Profile model — single JSON document updated in place (well-scoped facts)
- Collection model — many narrow documents, each representing one entity or fact
- Agent-curated — the agent decides what to store and when to update
- Queryable — retrieved by semantic search or exact key lookup
Episodic Memory
A persistent log of past runs and their outcomes. Before acting, the agent retrieves relevant episodes to avoid known failure modes and reuse successful strategies.
- Write — after each run, record what happened and what worked
- Read — at task start, retrieve relevant past episodes by similarity
- Decay — old or low-relevance memories pruned over time
- Update — memories revised when new evidence contradicts them
Procedural Memory
The agent's operating instructions stored as updateable text. After a run, the agent evaluates its own process and proposes rule changes — writing better procedures over time.
- Instructions — operating procedures stored as editable rules or system prompt
- Learn — agent evaluates its own process and proposes improvements
- Write back — approved rules update the instruction store
- Version — previous procedures retained for rollback
External Memory (RAG)
Documents chunked, embedded, and retrieved by nearest-neighbor search. Static retrieval: one pass, then generate. Decouples knowledge from context length.
- Embed — documents chunked and embedded into a vector store offline
- Query — agent generates a retrieval query from current context
- Retrieve — top-k nearest chunks fetched by cosine similarity
- Augment — chunks injected into prompt before generation (no iteration)
Memory Write Strategies
A meta-pattern: when and how memories are committed. Hot path writes inline during execution; background defers to an async task. The choice trades latency for consistency.
- Hot path — write inline during the run; immediate availability, adds latency
- Background — defer write to an async task; no latency, eventual consistency
- Choose hot path when — the agent needs to read its own writes in the same session
- Choose background when — memory is for future sessions, throughput matters more
Quality & Verification
Reflection / Self-Critique
Agent evaluates its own output against a set of criteria and retries if the score falls below a threshold. The same model generates and critiques.
- Generate — produce an initial output
- Critique — score output against criteria (correctness, completeness, safety)
- Refine — revise based on critique, repeat until threshold met
- Max iterations — hard cap prevents infinite loops
Evaluator-Optimizer
Two separate LLMs: one generates, one evaluates. The evaluator returns structured feedback; the generator revises. Not self-loop — separate roles, separate prompts.
- Generator — produces a candidate output
- Evaluator — separate model scores it and returns structured critique
- Optimizer — generator revises using evaluator's specific feedback
- Key distinction from Reflection — evaluator is a separate LLM, not self-critique
Debate / Critique
Two agents argue opposing positions; a judge evaluates the exchange and synthesises a resolution. Surfaces assumptions a single agent misses.
- Proposer — generates a solution or claim
- Critic — argues against it, finds flaws and edge cases
- Judge — evaluates the exchange, picks winner or synthesises
- Useful for verification, adversarial testing, and high-stakes decisions
Guardrails
Validation layer on every input and output. Passive mode checks rules and flags; active mode blocks and escalates to human or replanning. Always on — not event-driven.
- Passive — regex, schema, or classifier checks on every pass; flag violations
- Active — violations trigger human escalation or agent replanning
- Layered — input guardrails before the model, output guardrails after
- Distinct from Approval Gate — always running, not triggered by action type
Multi-Agent Coordination
Pipeline
Output of one agent becomes input to the next. Each stage has a single well-defined transformation. Simple, auditable — a failed stage halts the pipeline.
- Each agent has one transformation: spec → code → tests → review → docs
- No agent needs awareness of stages before or after it
- Simple and auditable — clear data lineage through each stage
- Failed stage halts the pipeline; the artifact doesn't advance
Fan-out / Parallel
Orchestrator decomposes a task into N independent chunks and dispatches them simultaneously. Collects results when all complete, then reduces.
- Decompose — identify chunks that can run independently
- Dispatch — all chunks sent simultaneously to N agents
- Wait — collect when all complete (or a quorum)
- Reduce — merge results, resolve conflicts, synthesise final output
Routing / Triage
Classify the incoming request and dispatch to the right specialist or workflow branch. Pure dispatch logic — once routed, the classifier is done.
- Classify — identify the intent, domain, or urgency of the request
- Route — select the appropriate specialist agent or workflow branch
- Stateless — each request classified and routed independently
- No coordination — distinct from orchestration; the router doesn't synthesise
Orchestrator + Subagents
One coordinator agent holds overall context and delegates focused subtasks to specialist subagents. Orchestrator synthesises outputs into a final result.
- Orchestrator — plans, delegates, holds context, aggregates
- Subagents — specialist roles: coder, researcher, reviewer, tester
- Context passing — orchestrator decides what each subagent needs
- Result synthesis — orchestrator merges outputs into final answer
Hierarchical Agent
Multi-level delegation tree: manager → supervisors → workers. Each tier receives only the context it needs. Scales to problems that exceed a single context window.
- Manager — holds top-level goal and overall context
- Supervisors — hold team-level context, break goal into subtasks
- Workers — receive focused subtask input only; no awareness of broader goal
- Context scoping — prevents pollution; each tier gets exactly what it needs
Handoff
Explicit transfer of control and conversation state from one specialist to the next. State packet travels with the handoff — each agent is fully active one at a time.
- State packet — context, history, and intent transferred explicitly
- Specialist — receiving agent activates with full context; prior agent deactivates
- Chain — can handoff multiple times (triage → billing → senior support)
- Unlike orchestration — agents are sequential, not concurrent
Swarm / Decentralised
Agents communicate peer-to-peer with no central coordinator. Self-select tasks based on availability and signals. Resilient — no single point of failure.
- Peer-to-peer — agents communicate directly, no orchestrator
- Self-selection — agents pick up tasks based on local signals
- Emergent — overall behaviour arises from local decisions
- Resilient — any node can drop without halting the system
Event-Driven / Actor Model
Each agent has an inbox queue and processes messages independently. Agents emit events and continue without waiting. No blocking — fully async.
- Actors — each agent has an inbox and processes messages independently
- Async — agents emit events and continue without waiting for replies
- Decoupled — agents don't know who consumes their output events
- Inbox depth — a backpressure signal; deep queues indicate overload
Control Flow & Reliability
Retry with Fallback
On tool failure, classify the error: transient errors get retried with backoff; structural errors trigger a fallback strategy. Avoids hitting the same wall repeatedly.
- Classify — transient (network, rate limit) vs structural (wrong key, unsupported op)
- Backoff — exponential: 1s, 2s, 4s for transient errors only
- Fallback chain — ordered list of alternative strategies
- Dead letter — if all fallbacks fail, surface to human or dead-letter queue
Checkpoint / Approval Gate
Agent runs autonomously until it reaches an irreversible or high-blast-radius action. At the gate it surfaces its plan to a human and waits for approval.
- Classify — reversible vs irreversible, local vs shared system
- Gate triggers — file deletion, push to remote, external API side effects
- Present context — show plan and reasoning before asking
- Resume or replan — on approval continue; on denial replan or stop
Confidence-Gated Autonomy
Agent scores its own confidence before acting. Above threshold: proceed autonomously. Below threshold: escalate to human. Thresholds vary by action type.
- Score — agent rates confidence 0–1 before each action
- Threshold — tuned per action type (0.95 for destructive, 0.60 for safe reads)
- Escalate — below threshold → ask human; above → proceed
- Calibrate — track actual accuracy vs confidence over time
Speculative Execution
Spawn N agents with different strategies in parallel. The first to succeed wins; remaining branches are cancelled. Trades compute for latency.
- Branch — spawn N agents with different strategies simultaneously
- Race — first successful result wins; remaining branches cancelled
- Merge — alternative: collect all results and combine instead of racing
- Useful when the correct approach is uncertain upfront
Prompt Caching
Cache repeated long prefixes (system prompt, tool definitions, document context) across calls. The provider skips re-encoding the prefix — significant latency and cost reduction.
- Identify — system prompt, tool schemas, and document context are cacheable
- Cache — provider stores the prefix KV state after first call
- Reuse — subsequent calls skip re-encoding the cached prefix
- Impact — meaningful latency and token cost reduction for long-context agents
Coding Agent Patterns
Code → Test → Fix Loop
The tightest feedback loop in coding agents. Write code, run the test suite in a sandbox, parse failure messages, apply targeted patches, repeat.
- Write — generate implementation from spec
- Execute — run test suite in sandbox; parse stderr, assertions, stack traces
- Patch — targeted edits based on failure messages; not full rewrites
- Repeat — until all tests pass or max iterations hit
Linter-in-the-Loop
Static analysis runs after code generation and before execution. Lint errors are parsed — file, line, rule, message — and fed back as observations for targeted fixes.
- Write — generate code
- Lint — run static analyzer (ESLint, mypy, clippy) before execution
- Parse — extract file, line, rule, message from lint output
- Fix — targeted edits per error; re-lint to verify fixes are clean
Scaffolded Execution
Agent writes code into an isolated runtime, executes it, and reads stdout/stderr/exit code as structured observations. The environment itself becomes a tool.
- Scaffold — set up isolated runtime (container, WASM, subprocess)
- Write — agent writes code to a file in the sandbox
- Execute — run the file; capture stdout, stderr, exit code
- Observe — output fed back as structured tool result; sandbox torn down
Git-Aware Agent
Agent reads git log, diff, and blame as context before acting. Understands what changed and why — not just the current state — by mining commit history and authorship.
- git log — understand recent trajectory and intent behind changes
- git diff — see what changed since last commit; understand current state
- git blame — trace a line back to its author, commit, and reason
- Branch context — main vs feature vs hotfix informs risk level of proposed changes