Agent Architecture
Updated Agent Architecture with Google Cloud's four-component model (persona, memory, tools, model), surface vs. background agent distinction, single vs. multi-agent architecture comparison, and expanded architectural diagram.
title: "Agent Architecture" type: concept tags: [#planning, #reasoning, #tool-use, #memory, #multi-agent] created: 2025-01-01 updated: 2025-07-14 status: complete
Agent Architecture
Agent architecture describes the structural design of an AI agent system, specifying how components such as perception, reasoning, memory, tool use, and action-selection are organized and interact.
Overview
Agent architecture is the blueprint that determines how an AI agent thinks, remembers, acts, and communicates. A well-designed architecture integrates a foundation model as the core reasoning engine with surrounding components that extend its capabilities: memory systems for context retention, tool interfaces for environmental interaction, and orchestration logic for managing multi-step task execution.
The field has converged on several recurring structural patterns. The ReAct Framework introduced a widely adopted loop of interleaved reasoning and acting steps. Plan-and-Execute architectures separate high-level goal decomposition from low-level execution. More recent designs add self-refinement loops, where agents evaluate and revise their own outputs before committing to an action.
Google Cloud's component model for agent architecture identifies four core building blocks that every agent defines: a persona (role, personality, and communication style), memory (short-term, long-term, episodic, and consensus), tools (external resources and APIs), and the model (the LLM foundation that processes language and generates reasoning). These components interact within an operational loop of observe → reason → plan → act → self-refine.
Architecture choices directly determine an agent's capability ceiling, latency profile, cost, and safety properties. Single-agent architectures optimize for simplicity and well-defined tasks; multi-agent architectures composed of specialized sub-agents enable tackling problems of greater complexity at the cost of coordination overhead. See Multi-Agent Coordination and Agent Orchestration for the governance layer that sits above individual agent designs.
How It Works
A standard agent architecture consists of the following layers:
┌──────────────────────────────────────────────────┐
│ GOAL / TASK INPUT │
└───────────────────────┬──────────────────────────┘
│
┌───────────────────────▼──────────────────────────┐
│ REASONING ENGINE (LLM) │
│ - Persona definition │
│ - Instruction following │
│ - Chain-of-thought / planning │
└──────┬────────────────┬────────────────┬─────────┘
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ MEMORY │ │ TOOLS │ │ ACTION │
│ Short-term │ │ APIs, DBs │ │ Execution │
│ Long-term │ │ Code, Search│ │ & Output │
│ Episodic │ │ External sys│ └─────────────┘
│ Consensus │ └─────────────┘
└─────────────┘
Core loop:
- The agent receives a goal or task
- The LLM reasons over available context (memory + observations)
- The agent selects and executes a tool or action
- Observations from the action feed back into memory and the next reasoning step
- The loop continues until the goal is achieved or a stopping criterion is met
- Self-refinement may trigger re-evaluation of outputs before they are committed
Key Properties / Characteristics
- Persona-driven behavior: Each agent defines a role and communication style that shapes how it interprets goals and formats outputs
- Four-component structure: Persona, memory, tools, and model are the canonical building blocks per Google Cloud's framework
- Observe-reason-plan-act loop: The core operational cycle shared across most modern agent architectures
- Modularity: Components can be swapped or upgraded independently (e.g., changing the LLM without changing the memory system)
- Scalability via composition: Single agents can be composed into multi-agent networks, with each agent potentially using a different foundation model optimized for its role
- Tool extensibility: Architecture is open-ended; new capabilities are added by registering new tools rather than retraining the model
Variants & Related Approaches
- ReAct Framework: Interleaved reasoning traces and tool-calling actions in a single context window; the dominant pattern for single-agent systems
- Plan-and-Execute: A planner agent decomposes the goal into a sequence of steps; an executor agent carries them out; reduces error propagation by separating concerns
- Reflection architectures: Add an explicit self-critique step where the agent evaluates its own output and revises before acting
- Surface vs. background agents: Surface agents interact conversationally with users; background agents run autonomously on event-driven queues — a distinction based on interaction mode rather than internal architecture
- Single-agent vs. multi-agent: Single agents use one foundation model; multi-agent systems allow different models per agent, matched to task requirements
Strengths & Limitations
Strengths
- Modular design allows incremental capability expansion without full system redesign
- LLM-as-brain abstraction means advances in foundation models directly improve agent performance
- Tool-use extensibility means agents can interact with virtually any external system
- Multi-agent composition enables specialization and parallelism beyond any single agent's capacity
Limitations
- Context window limits constrain how much short-term memory and reasoning history the LLM can utilize
- Tool selection errors compound across multi-step tasks, leading to error cascades
- Persona and instruction design significantly affects reliability — poorly specified agents behave inconsistently
- Multi-agent architectures introduce coordination complexity and consensus memory consistency challenges
- Latency and cost scale with the number of LLM calls in the reasoning loop
Notable Uses / Applications
- Google Cloud Vertex AI Agent Builder: Builds agents using the four-component architecture; supports both single and multi-agent configurations
- Google Agent Development Kit (ADK): Open-source Python SDK implementing multi-agent orchestration with memory and tool integration
- Code agents: Specialized architectures for code generation, testing, and debugging workflows
- Security agents: Architectures tuned for threat detection and response across the security lifecycle
Source Material
- What are AI agents? Definition, examples, and types | Google Cloud — Source for the four-component architecture model (persona, memory, tools, model), single vs. multi-agent distinctions, surface vs. background agent types, and operational loop.
- Yao et al. 2022 — ReAct — Foundational paper establishing the Reason+Act loop as a core architectural pattern.
Related Pages
Is a type of: Agentic AI Implements: ReAct Framework Uses / Depends on: Tool Use, Memory Systems Governed by: Agent Orchestration Scales to: Multi-Agent Coordination Safety considerations: AI Safety and Alignment
Open Questions
- How should persona specifications be formally validated to ensure consistent, safe agent behavior?
- What are the optimal context management strategies when short-term memory approaches context window limits?
- How do different foundation models perform as the reasoning engine within the same architectural shell?
- Is the four-component model (persona, memory, tools, model) sufficient, or are additional structural elements needed for more complex agentic systems?
- How should architecture evolve to support real-time, streaming task execution?
Page type: concept | Status: complete