AI Agent Architecture (original) (raw)

Last Updated : 28 Jan, 2026

AI agents are autonomous systems that can execute complex logical tasks on behalf of a user by retrieving additional information, recalling historical interactions and programmatically invoking external tools to take action, plan and decide on what to do next, an AI agent can :

**Observe : Observe its environment (data, messages, user query, sensor values).
**Reasons : analyses and plans on what to do next based on constraints or heuristics.
**Act : Act can be in the form of interacting with environment or answering a question or calling a function.
**Learns(optional) : Agent can learn from its own mistakes, providing better output over time.

intelligent_agent_structure

Agent structure

Workflows vs. Agents

The following table highlights the key differences between workflows and agents.

Criteria	Workflows	Agents
Definition	Pre defined rule based sequence of steps	Autonomous systems that decide the steps
Control	High Human control	shared control between human and the system
Flexibility	Low - Fixed execution path	High - Complex branches & loops
Best suited	Repeatable, Deterministic processes	Open-ended complex problem solving
Examples	ETL jobs, data validation	AI coding, research agents

Key components of an Agent

1. LLM (Large Language model)

An agent requires a LLM to function , a LLM can be thought of as the brain of the agent , it analyses, plans and decides the next action to take , a stronger LLM generally leads to better outcomes , but this is not always true .

A bigger LLM trades better outputs for increased latency.
In some cases , a smaller language model can outperform a large language model on niche tasks.
Examples of popular open-source LLMs are : llama : 8B, GPT-OSS 20B, qwen - 2.5B.

2. Working Memory

Working memory or contextual memory stores information about previous steps taken or executed , it can be thought of as memory of a model , helping it remember contexts and provide accurate answers , for e.g. if you ask a question "what are my current sales in 2025" and then follow it up with "give top 10" , the model can automatically reason that you are talking about "top 10 sales in 2025".

**Context retention : Stores information from previous steps, messages or actions so the agent can maintain continuity across a task or conversation.
**State tracking : Helps the agent keep track of what has already been done, what data is available and what needs to happen next.
**Improved reasoning : Enables follow-up questions and implicit references (e.g., understanding that “give top 10” refers to sales in 2025), leading to more accurate and relevant responses.

3. Retrieval

Retrieval allows an agent to access information beyond what is stored inside the language model, enabling accurate and up-to-date responses.

**External knowledge access : Fetches relevant data from documents, databases, APIs or search systems when needed.
**Contextual relevance : Retrieves only the most relevant information to the task, reducing noise and improving efficiency.
**Grounded outputs : Ensures responses are based on real, verifiable data rather than assumptions or hallucinations.

4. Tools

Tools enable an agent to take actions and interact with external systems, extending its capabilities beyond reasoning and text generation.

**Action execution: Allows the agent to perform tasks such as calling APIs, running code, querying databases or triggering workflows.
**System interaction: Enables integration with external services like CRMs, analytics platforms, browsers or operating systems.
**Task completion: Helps the agent move from planning to execution, making it capable of completing real-world tasks rather than only providing suggestions.

Agents-in-LangChain

AI Agent

Single vs. multi-agent AI pattern

It describes how intelligence is organized within a system. A single-agent AI consists of one autonomous decision-maker that perceives its environment, reasons and acts independently to optimize a defined objective. In contrast, multi-agent AI involves multiple autonomous agents operating within a shared environment, where overall system behavior emerges from their interactions.

**Single-agent AI

Centralized reasoning and control, which simplifies design, training and debugging.
Best suited for well-defined tasks with limited interaction dynamics (e.g., single-player games, standalone optimization).
Limited adaptability in highly dynamic or adversarial environments.

**Multi-agent AI

agent_architecture

Single v/s multi

Decentralized decision-making with agents that may coordinate, negotiate or compete.
Effective for complex systems requiring scalability, robustness or modeling of social/strategic interactions (e.g., traffic systems, markets, swarm robotics).
Introduces parallelism, allowing agents to work simultaneously and solve problems faster and more efficiently.

Architecture Patterns

1. Prompt Chaining

Prompt chaining is a technique where a complex task is broken into multiple smaller prompts and the output of one prompt becomes the input to the next. Instead of asking the LLM to do everything at once, you guide it step by step.

Improves accuracy by handling one reasoning step at a time.
Increases control and transparency over the model’s thinking.
Works well for multi-step tasks like analysis, planning and generation.

prompt_chaining

Prompt chaining

2. Routing pattern

Routing is a pattern where an input is analyzed first and then directed to the most appropriate prompt, tool or agent instead of using a single fixed response path.

Selects the best handler based on intent, type or complexity.
Improves efficiency by avoiding unnecessary steps or tools.
Common in agent systems, customer support bots and workflows.

routing_workflow

Routing workflow

3. Parallelization

Parallel execution is a pattern where multiple tasks or prompts are run at the same time and their results are later combined to produce a final output.

Reduces latency by processing independent steps simultaneously.
Improves coverage by exploring multiple approaches at once.
Common in evaluation, retrieval and multi-agent systems.

parallelisation

parallelization

4. Orchestrator - worker pattern

The orchestrator pattern uses a central controller to plan, coordinate and manage multiple tasks, tools or agents to achieve a larger goal efficiently.

Breaks complex problems into manageable subtasks.
Controls execution order, dependencies and data flow.
Common in agent systems, workflows and enterprise automation.

orchestrator_workers

Orchastrator worker

5. Reflection Pattern

The reflection pattern allows a model or agent to review its own outputs, evaluate quality and make improvements before producing the final response.

Identifies errors, gaps or inconsistencies in reasoning.
Improves reliability through self-correction loops.
Common in agent systems, long-form generation and planning tasks.

evaluator_optimizer

Reflection pattern

Implementation

We can implement a simple workflow using langchain, for more complex workflows we use lanngraph.

Step 1: Download & Import the necessary libraries

we will begin by downloading and importing the required packages for our implementation

Python `

!pip install langchain transformers torch accelerate langchain_community import torch from transformers import pipeline from langchain_community.llms import HuggingFacePipeline from langchain_core.prompts import PromptTemplate

Step 2: Initializing a LLM

We will create a HF-text generation pipeline and wrap it in a Langchain LLM for easier access.

Python `

pipe = pipeline( "text2text-generation", model="google/flan-t5-base", max_new_tokens=128 )

llm = HuggingFacePipeline(pipeline=pipe)

Step 3: Building Prompt and Creating a chain

We will build a prompt using Prompt template and use langchain's LCEL to initialize flow.

Python `

prompt = PromptTemplate.from_template( "Explain this question clearly: {question}" ) chain = prompt | llm

Step 4: Invoke the LLM

We will invoke the LLM with a custom query to test its output.

Python `

print(chain.invoke("What is computer science?"))

**Output:

Computer science is the study of how computers compute, including algorithms, systems and information processing.

You can download full source code from here.