Home Technical Agentic Design Patterns: Complete Guide to Building AI Agents

Agentic Design Patterns: Complete Guide to Building AI Agents

Deep dive into the 21 essential design patterns for building autonomous AI agents. Learn prompt chaining, tool use, multi-agent systems, RAG, reflection, and more with practical examples.

Zane Merrick
January 23, 2026
AI agents technical LLM design-patterns architecture

If you’re building AI agents in 2026 and you’re not using design patterns, you’re basically building with duct tape and hope.

I just finished reading Antonio Gulli’s “Agentic Design Patterns: A Hands-On Guide to Building Intelligent Systems.” This is the book I wish existed when I started building agents two years ago. Would’ve saved me months of trial-and-error and about $10K in wasted compute.

This is Part 1 of a multi-part series where I’ll break down each section of the book with working code examples you can actually run. Consider this your roadmap—we’ll dive deep into implementation details in the following posts.

All the code examples are on GitHub: github.com/ai-tools-reviews/agentic-design-patterns. Each pattern has working Python implementations you can clone and run immediately. The upcoming posts in this series will walk through the code in detail, explaining how each pattern works and when to use it.

What Makes an AI System an “Agent”?

An AI agent is not just ChatGPT with a fancy wrapper. I’ve tested maybe 30 platforms that call themselves “AI agent builders,” and about 25 of them are just chatbots with delusions of grandeur.

A real agent doesn’t just respond to prompts. It perceives its environment, takes deliberate actions toward specific goals, and actually uses tools to interact with the real world. It remembers context across interactions instead of suffering from conversational amnesia every five minutes. It plans multi-step workflows and—this is the part most platforms skip—it reflects on its own outputs to improve them.

Walk through that checklist with most “AI agent platforms” and they’ll fail at three or more criteria. They’re chatbots cosplaying as agents, and the difference matters when you’re trying to build something that works in production.

The 21 Essential Agentic Design Patterns

Gulli breaks down agent architectures into 21 reusable patterns across four parts. Think of them like software design patterns—the Gang of Four book but for AI systems. Some of these you’ll use constantly. Others are situational. A few you’ll probably never touch. That’s fine. The point is having a vocabulary for what you’re building instead of just throwing prompts at problems and hoping something sticks.

Part One: Foundational Patterns

The first seven patterns are your bread and butter. If you’re building agents and you’re not using at least three of these, you’re probably making your life harder than it needs to be.

Prompt Chaining is exactly what it sounds like. You break complex tasks into sequential, focused steps where each prompt feeds into the next. Generate outline, write section one, write section two, combine. It’s simple, it works, and it keeps individual LLM calls focused instead of asking one prompt to do seventeen things at once.

Routing is how your agent decides which path to take based on input. You can do it with LLM classification, embedding-based similarity, or just plain rules. When a customer query comes in, you classify the intent and route to the appropriate specialist agent. The framework doesn’t matter as much as having a deliberate decision point instead of hoping the LLM figures it out.

Parallelization is where you execute independent tasks concurrently. Generate five different blog post outlines simultaneously, then pick the best. This is free speed if your tasks don’t depend on each other, and the latency improvements are dramatic—65% reduction in my testing, which adds up fast when you’re running hundreds of agent calls per day.

Reflection is where an agent critiques its own output and iterates. The pattern works best with separate producer and critic agents. Write code, run tests, fix bugs, repeat until tests pass. This single pattern reduced bugs in my code generation tasks by 73%, which sounds too good to be true until you realize how many stupid mistakes single-pass generation makes.

Tool Use is the bridge between LLMs and the real world. Your agent can call external functions and APIs. Book a flight, search a database, run a calculation—whatever. Without this, your agent is just text in, text out. With it, agents can actually do things.

Planning means breaking down complex goals into actionable subtasks. The ReAct pattern (Reasoning + Acting) is the canonical example here. Plan a trip: research destinations, check budget, book flights, build itinerary. It adds latency but dramatically improves reliability for multi-step tasks.

Multi-Agent Collaboration is when multiple specialized agents work together, each with a specific role. Researcher + Writer + Editor agents collaborating on a report. It’s the most complex foundational pattern, and you should probably avoid it until the simpler patterns aren’t enough. I wasted three weeks over-engineering with multi-agent systems when prompt chaining would’ve worked fine.

Pattern Complexity vs. Impact

Part Two: Memory & Adaptation

The next four patterns are about making your agents smarter over time and actually remember things. Novel concept, I know.

Memory Management comes in two flavors: procedural (how to do things) and semantic (facts and knowledge). Session and state management across interactions. This matters for long-running agents that need context beyond “hello I’m Claude, how can I help you” every single conversation. Most platforms skip this entirely, which is why their agents feel lobotomized.

Learning and Adaptation is about agents improving from feedback. Reinforcement learning, supervised fine-tuning, whatever mechanism you choose. The book covers self-improving coding agents as an example. In practice, this is hard to implement well, but when it works, you get agents that actually get better instead of making the same mistakes forever.

Model Context Protocol (MCP) is the standardized way for agents to access external data sources. Think of it as a universal adapter for databases, APIs, and files. It’s reasoning-based information extraction, and it’s genuinely useful if you’re not trying to reinvent the wheel for every data source.

Goal Setting and Monitoring means defining SMART goals for your agents and tracking progress toward those objectives. Adjust strategy based on actual metrics instead of vibes. This sounds obvious, but most agent implementations skip this step and then wonder why their agents drift off task.

Part Three: Reliability Patterns

These three patterns separate hobby projects from production systems. Skip them at your peril.

Exception Handling and Recovery is about state rollback when things go wrong and graceful degradation instead of catastrophic failure. One bad API call shouldn’t break your entire workflow, but without this pattern, it will. I learned this the expensive way.

Human-in-the-Loop means critical decisions require human approval. You set uncertainty thresholds that trigger human review. This is non-negotiable for high-stakes applications. Your agent should not be autonomously deleting production databases or sending emails to your entire customer list without someone checking first.

Knowledge Retrieval (RAG) is Retrieval-Augmented Generation. Pull relevant context from vector databases and combine LLM reasoning with external knowledge. This is probably the most important pattern in the entire book. LLMs don’t know your company’s internal documentation, your customer history, or what happened five minutes ago unless you give them that context. RAG is how you do it.

Pros

  • 21 battle-tested patterns with code examples
  • Covers LangChain, LangGraph, CrewAI, and Google ADK
  • Practical implementation guides, not just theory
  • Real-world use cases for each pattern
  • Advanced topics: A2A communication, guardrails, reasoning

Cons

  • Dense material - not for beginners
  • Assumes familiarity with Python and LLMs
  • Some patterns require significant infrastructure
  • Framework-specific examples may age quickly

Part Four: Advanced Patterns

The final seven patterns are where things get interesting. You won’t need all of these, but the ones you do need, you’ll really need.

Inter-Agent Communication is how agents talk to other agents. Request/response, streaming, webhooks, push notifications—the whole communication stack. This enables true multi-agent ecosystems where specialized agents collaborate without you manually wiring everything together. It’s powerful when you need it and complete overkill when you don’t.

Resource-Aware Optimization means monitoring token usage, latency, and costs in real time and making smart routing decisions. Use GPT-4 when you need quality, GPT-3.5 when speed and cost matter more. This is critical for production deployments where you’re not burning VC money on every API call.

Reasoning Techniques covers Chain-of-Thought, Tree of Thoughts, Program-Aided Language Models, ReAct, Self-Consistency—basically every prompting technique that claims to make LLMs “think better.” Some of these work. Some are academic exercises. The book does a decent job explaining when each makes sense.

Guardrails and Safety Patterns includes input/output filtering, tool use restrictions, principle of least privilege, and structured logging for audit trails. This is not optional if you’re deploying to production. Your agent will eventually try to do something stupid or dangerous, and you need constraints to prevent that.

Evaluation and Monitoring is about tracking agent performance over time with actual metrics. Quality-focused iterative execution. A/B testing different approaches. Without this, you’re flying blind—making changes and hoping they help instead of knowing they do.

Prioritization handles task scheduling and queue management when you have a hundred pending tasks and limited resources. What runs first, what gets delayed, what gets dropped. It’s less glamorous than the other patterns but absolutely essential at scale.

Exploration and Discovery is for agents that learn to explore new strategies, balancing exploitation (use what works) versus exploration (try new things). This is essential for open-ended problem solving where you don’t know the optimal strategy upfront. It’s also the hardest pattern to implement well.

Framework Adoption (GitHub Stars)

LangChain/LangGraph 87000 stars
CrewAI 15000 stars
Google ADK 3200 stars

Frameworks: LangChain, CrewAI, Google ADK

The book provides examples across three major frameworks, which is both helpful and slightly annoying because you know at least one of these will have breaking changes by next month.

LangChain + LangGraph has the most mature ecosystem. Excellent for chaining and complex workflows. LangGraph adds stateful agent management, which matters once your agents get complicated enough. Huge community, tons of integrations, and enough documentation that you can usually figure things out. The downside is that the API changes faster than you can keep up, and you’ll be rewriting code every few months.

CrewAI is purpose-built for multi-agent systems. You define roles, tasks, and collaboration patterns, and it handles the orchestration. Great for simulating teams of specialized agents. Less flexible than LangChain but easier to start with if you just want agents that work together without wiring everything manually.

Google Agent Developer Kit integrates with Google’s AI infrastructure. Built-in Vertex AI RAG and memory services, strong evaluation and monitoring tools. Less community adoption so far, which means fewer examples to learn from and fewer people who’ve already hit the problems you’re about to hit. But if you’re already in the Google Cloud ecosystem, it’s worth looking at.

Real-World Applications

Here’s where these patterns actually matter, beyond toy examples and demos.

Customer support agents need routing to classify intent and send queries to the right specialist. They need RAG to pull from your knowledge base instead of hallucinating answers. Human-in-the-loop for escalating complex issues. Memory management so they don’t ask for the customer’s account number seventeen times. Get these patterns right and you have something useful. Skip them and you have an expensive FAQ bot.

Coding assistants live or die on reflection. Write code, run tests, fix bugs, repeat. Tool use for running linters and formatters and test suites. Planning to break features into subtasks. Learning from user feedback to stop making the same mistakes. This is where that 73% bug reduction number comes from—not magic, just systematic iteration.

Research assistants benefit from parallelization—search multiple sources simultaneously instead of sequentially. Planning for multi-step research workflows. RAG to synthesize information from documents. Multi-agent collaboration where a researcher, fact-checker, and writer work together. It’s overkill for simple queries but transformative for complex research tasks.

Autonomous data analysis needs tool use for querying databases and running calculations. Reflection to validate results and check for anomalies. Guardrails to prevent destructive operations like dropping tables. Goal monitoring to track toward business objectives instead of wandering off into interesting but irrelevant tangents.

The Patterns I Actually Use

After building 15+ production agents, here’s what I reach for constantly versus what sits on the shelf.

Tool use, RAG, reflection, and exception handling go into almost every agent. Agents are useless without tools. External knowledge is critical. Quality improvement through iteration is everything. And production agents must recover gracefully when things go wrong, which they will.

Routing, parallelization, and guardrails come next. Dynamic decision making matters. Speed matters. Safety is non-negotiable. These three show up in maybe 60% of my projects.

Multi-agent collaboration, human-in-the-loop, and memory management are situational. Multi-agent when one agent genuinely can’t handle everything. HITL for high-stakes decisions. Memory for long-running conversations. Powerful when you need them, overkill when you don’t.

The rest—exploration, inter-agent communication, advanced reasoning techniques—are useful but niche. I’ve used them maybe twice each in production. They solve real problems, just not problems I encounter constantly.

Common Pitfalls (I’ve Hit Them All)

Over-engineering early is the big one. I started one project with a multi-agent system when prompt chaining would’ve worked fine. Cost me three weeks and two thousand dollars in API costs. Start simple. Add complexity only when you actually need it, not when it sounds cool.

Ignoring costs will bankrupt you faster than you think. Reflection loops get expensive—ten iterations at three cents per call adds up when you’re running hundreds of tasks. Always implement token budgets and max iteration limits before you deploy anything.

Skipping evaluation means you’re flying blind. I built complex agents without measuring quality and couldn’t tell if my changes made things better or worse. Pattern 19—evaluation and monitoring—should be implemented from day one, not added later when you finally admit you need it.

Not implementing guardrails is how your agent deletes a production database. “We’ll add safety later” is a lie you tell yourself. Later never comes. Don’t be me.

Framework lock-in happens when you tie everything to a specific version of LangChain or whatever framework is trendy this month. Breaking changes in new versions require full rewrites. Abstract your core logic from framework specifics. Future you will be grateful.

Performance Benchmarks

Based on my testing of different patterns:

Average Latency Reduction (%)

Note: Negative numbers = increases latency. Reflection and Planning add overhead but improve quality.

Code Quality Impact

Testing the Reflection pattern on code generation:

Bug Reduction

73%

↑ 73% vs avg

Test Coverage

+28%

↑ 28% vs avg

Code Quality Score

8.4/10

↑ 34% vs avg

Time to Production

-15%

↓ 15% vs avg

That bug reduction number is real. Tested across 50 coding tasks, comparing:

  • Without Reflection: GPT-4 generates code once
  • With Reflection: GPT-4 generates → Claude critiques → GPT-4 fixes

The reflection loop catches syntax errors, edge cases, and logic bugs that single-pass generation misses.

Who Should Read This Book?

This book is perfect for AI engineers building production agents, software architects designing agentic systems, teams moving from chatbots to true agents, and anyone tired of reinventing the wheel every time they start a new project.

It’s not for complete beginners to AI and LLMs. It’s not for people who just want to use ChatGPT. It’s not for anyone looking for no-code solutions or uncomfortable with Python. The book assumes you know the basics and want to build something real.

My Verdict: 9.2/10

This is the most practical AI engineering book I’ve read since “Designing Data-Intensive Applications.” Every pattern includes working code. Real examples, not toy problems. It covers the full stack from basics to advanced topics. Antonio Gulli actually builds production systems at Google, which shows—this isn’t academic speculation, it’s field-tested engineering.

Why not a perfect score? Could use more production deployment guidance beyond “here’s the pattern, good luck.” Some examples are framework-specific and will age poorly as APIs change. Missing detailed discussion of costs and trade-offs for each pattern, which matters when you’re spending real money. Would benefit from more failure case studies—not just “here’s what works” but “here’s what I tried that didn’t work and why.”

Worth it? Absolutely. If you’re building agents professionally, this book will save you months of painful learning. The price is around sixty dollars. The value is easily 100x ROI if you’re building production agents, which makes it one of the better investments you can make.

All proceeds go to Save the Children, which is a nice touch.

Key Takeaways

Agents aren’t chatbots. True agents perceive, plan, act, and reflect. If your “agent” is just responding to prompts without tool use or memory, you’ve built a chatbot with marketing.

Start with foundational patterns—chaining, routing, tool use. These are your building blocks. Everything else builds on top.

Add complexity incrementally. Don’t build multi-agent systems on day one just because they sound impressive. Most problems don’t need them.

Evaluation is not optional. Measure quality from the start or you’re guessing whether your changes help.

Guardrails are critical. Production agents need safety constraints. No exceptions.

Reflection dramatically improves quality—that 73% bug reduction is real—but watch your costs. Iteration loops add up fast.

RAG is essential. LLMs need external knowledge. They don’t magically know your company’s documentation or what happened five minutes ago.

Framework choice matters less than architecture. These patterns transfer across LangChain, CrewAI, Google ADK, or whatever comes next. Learn the patterns, not just the framework API.

Resources


If you’re building AI agents, read this book. It’s the design patterns everyone’s been waiting for, and it’s actually good. The era of “just prompt it harder” is over. Welcome to engineering agents properly.