Building AI Agents from Scratch
PROGRAMMING LANGUAGES Dec. 14, 2025, 5:30 a.m.

Building AI Agents from Scratch

Artificial intelligence agents have moved from research labs to everyday applications, powering everything from chatbots to autonomous drones. Yet many developers still wonder how to build an agent from the ground up without relying on black‑box services. In this guide we’ll demystify the process, walk through the core building blocks, and deliver two hands‑on code examples you can run today. By the end, you’ll have a solid mental model and a reusable skeleton for your own AI agents.

Understanding What an AI Agent Is

An AI agent is any software entity that perceives its environment through inputs, processes those inputs, and takes actions that influence the environment. The classic definition—perceive‑think‑act—covers everything from a thermostat adjusting temperature to a conversational assistant scheduling meetings.

Agents differ from simple scripts because they maintain state, make decisions based on goals or learned policies, and often operate continuously. This distinction matters when you design your system: you need a loop that constantly gathers observations, updates internal models, and issues commands.

Key Characteristics

  • Autonomy: The agent decides when and how to act without external prompting.
  • Goal‑oriented behavior: It strives to achieve one or more objectives, which can be explicit (e.g., maximize reward) or implicit (e.g., follow conversational etiquette).
  • Adaptability: Learning agents improve over time, while rule‑based agents may simply follow static logic.

With these traits in mind, let’s break down the architecture that supports them.

Core Components of an AI Agent

Every functional agent consists of four interchangeable modules: Sensors, Perception, Decision Engine, and Actuators. Think of them as a pipeline where data flows from raw inputs to concrete actions.

Sensors collect raw data from the environment—text, images, telemetry, or API responses. In a chatbot, the sensor is the HTTP request that carries the user’s message. In a robot, it could be a LIDAR scan.

Perception transforms raw sensor data into a structured representation the agent can reason about. This step often involves parsing, feature extraction, or embedding generation using models like BERT or CLIP.

Decision Engine is the brain. It may be a simple rule set, a planning algorithm, a reinforcement‑learning policy, or a large language model (LLM). The engine consumes the perception output and produces an action plan.

Actuators execute the chosen actions—sending a reply, moving a motor, or invoking another service. They close the loop by affecting the environment, which the sensors will observe in the next cycle.

Data Flow Diagram (textual)


User Input  →  Sensor  →  Perception  →  Decision Engine  →  Actuator  →  Environment

Understanding this flow helps you decide where to plug in custom logic or third‑party APIs.

Building a Minimal Reflex Agent in Python

Let’s start with the simplest possible agent: a reflex agent that reacts to specific keywords. This example shows the full loop without any external dependencies, making it ideal for learning or rapid prototyping.

Step‑by‑Step Walkthrough

  1. Define a Sensor that reads user input from the console.
  2. Implement a Perception function that tokenizes the input.
  3. Create a DecisionEngine that matches tokens against a rule table.
  4. Write an Actuator that prints a response.

The agent runs inside an infinite while loop, breaking only when the user types “exit”.


import re

class ReflexAgent:
    def __init__(self):
        # Simple rule table: pattern → response
        self.rules = {
            r'\bhello\b': "Hi there! How can I help you today?",
            r'\bweather\b': "I’m not a meteorologist, but you can check weather.com.",
            r'\btime\b': "It’s time to write some code! 🕒",
        }

    def sensor(self):
        """Capture raw user input."""
        return input("> ").strip().lower()

    def perception(self, raw):
        """Basic tokenization using regex."""
        return re.findall(r'\b\w+\b', raw)

    def decision_engine(self, tokens):
        """Match tokens against rule patterns."""
        text = ' '.join(tokens)
        for pattern, response in self.rules.items():
            if re.search(pattern, text):
                return response
        return "I’m not sure how to respond to that."

    def actuator(self, response):
        """Output the response to the console."""
        print(response)

    def run(self):
        print("ReflexAgent ready. Type 'exit' to quit.")
        while True:
            raw = self.sensor()
            if raw == "exit":
                print("Goodbye!")
                break
            tokens = self.perception(raw)
            response = self.decision_engine(tokens)
            self.actuator(response)

if __name__ == "__main__":
    agent = ReflexAgent()
    agent.run()

This compact script demonstrates the perception‑action loop in under 50 lines. Although limited, you can extend it by adding more patterns, integrating a sentiment analyzer, or swapping the rule table for a small decision tree.

Pro tip: Keep your rule table in a separate JSON file. That way you can update responses without touching the code, and you can load language‑specific rule sets at runtime.

From Reflex to Goal‑Based: Adding Memory and Objectives

Real‑world agents rarely act purely reflexively. They need memory (state) and a notion of goals—whether that’s completing a task, maximizing user satisfaction, or minimizing energy consumption. We’ll upgrade the reflex agent into a goal‑based agent that schedules meetings using a simple calendar model.

Design Overview

  • State Store: A dictionary that tracks pending tasks and confirmed meetings.
  • Goal Definition: “Schedule a meeting when the user provides a date, time, and participants.”
  • Planner: A function that checks if enough information is present; if not, it asks follow‑up questions.

The agent will still use console I/O, but now it will remember context across turns.


import re
from datetime import datetime

class GoalBasedAgent:
    def __init__(self):
        self.state = {
            "date": None,
            "time": None,
            "participants": [],
        }
        self.goal = "schedule_meeting"

    def sensor(self):
        return input("> ").strip()

    def perception(self, raw):
        # Extract potential date, time, and names
        date_match = re.search(r'\b(\d{4}-\d{2}-\d{2})\b', raw)
        time_match = re.search(r'\b(\d{1,2}:\d{2})\b', raw)
        names = re.findall(r'@(\w+)', raw)  # e.g., @alice @bob

        return {
            "date": date_match.group(1) if date_match else None,
            "time": time_match.group(1) if time_match else None,
            "participants": names,
        }

    def update_state(self, info):
        for key, value in info.items():
            if value:
                if key == "participants":
                    self.state[key].extend(value)
                else:
                    self.state[key] = value

    def planner(self):
        missing = [k for k, v in self.state.items() if not v or (k=="participants" and not v)]
        if not missing:
            return "confirm"
        return missing[0]  # ask about the first missing piece

    def decision_engine(self, missing):
        prompts = {
            "date": "When should the meeting take place? Please use YYYY‑MM‑DD.",
            "time": "What time works best? Use HH:MM (24‑hour).",
            "participants": "Who should attend? Mention them with @username.",
        }
        if missing == "confirm":
            return f"Meeting scheduled on {self.state['date']} at {self.state['time']} with {', '.join(self.state['participants'])}."
        return prompts[missing]

    def actuator(self, response):
        print(response)

    def run(self):
        print("GoalBasedAgent ready. Type 'reset' to start over, 'exit' to quit.")
        while True:
            raw = self.sensor()
            if raw.lower() == "exit":
                print("Goodbye!")
                break
            if raw.lower() == "reset":
                self.state = {"date": None, "time": None, "participants": []}
                print("State cleared. Let's start again.")
                continue

            info = self.perception(raw)
            self.update_state(info)

            missing = self.planner()
            response = self.decision_engine(missing)
            self.actuator(response)

if __name__ == "__main__":
    agent = GoalBasedAgent()
    agent.run()

Notice how the agent now retains information across turns, asks targeted follow‑up questions, and finally confirms the meeting. This pattern—stateful perception + planner + goal check—is the backbone of many production assistants.

Pro tip: When building larger agents, persist the state to a lightweight database (SQLite, TinyDB) so that a crash doesn’t wipe the conversation history.

Leveraging Large Language Models as Decision Engines

Large language models (LLMs) have become the de‑facto decision engine for many modern agents. They excel at interpreting ambiguous language, generating creative responses, and even performing rudimentary reasoning. Below we’ll integrate OpenAI’s GPT‑4 API into a chatbot skeleton.

We’ll keep the sensor and actuator simple (console I/O) and let the LLM handle perception, planning, and action generation in a single API call. The code demonstrates how to structure prompts, manage token limits, and handle streaming responses for a responsive UI.

Prerequisites

  • Python 3.9+ installed.
  • OpenAI Python client (`pip install openai`).
  • An API key stored in the environment variable OPENAI_API_KEY.

import os
import openai

class LLMChatAgent:
    def __init__(self, model="gpt-4o-mini"):
        self.client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        self.model = model
        self.history = []  # List of dicts: {"role": "user"/"assistant", "content": "..."}
        self.system_prompt = (
            "You are a helpful AI assistant. Keep answers concise, "
            "use markdown when appropriate, and ask clarifying questions if needed."
        )
        self.history.append({"role": "system", "content": self.system_prompt})

    def sensor(self):
        return input("> ").strip()

    def update_history(self, role, content):
        self.history.append({"role": role, "content": content})
        # Keep only the last 10 exchanges to stay within token limits
        if len(self.history) > 22:
            self.history = self.history[-22:]

    def decision_engine(self, user_input):
        self.update_history("user", user_input)
        response = self.client.chat.completions.create(
            model=self.model,
            messages=self.history,
            temperature=0.7,
            stream=True  # Stream for real‑time feel
        )
        collected = ""
        for chunk in response:
            delta = chunk.choices[0].delta
            if "content" in delta:
                token = delta.content
                print(token, end="", flush=True)
                collected += token
        print()  # Newline after streaming
        self.update_history("assistant", collected)

    def run(self):
        print("LLMChatAgent ready. Type 'exit' to quit.")
        while True:
            user_input = self.sensor()
            if user_input.lower() == "exit":
                print("Goodbye!")
                break
            self.decision_engine(user_input)

if __name__ == "__main__":
    agent = LLMChatAgent()
    agent.run()

In this example the LLM acts as a unified perception‑decision‑action module. The history list preserves context, while the streaming API gives the illusion of a real‑time conversation.

Pro tip: Use temperature=0 for deterministic answers (e.g., when the agent must follow strict policies) and higher values for creative brainstorming sessions.

Real‑World Use Cases and Scaling Considerations

Now that you have three concrete implementations, let’s map them to real‑world scenarios and discuss how to scale each.

1. Customer Support Chatbot (LLM‑backed)

  • Why LLM? Natural language understanding, multi‑turn context, and the ability to generate helpful articles on the fly.
  • Scaling: Deploy the agent behind an API gateway, use request‑level caching for repeated queries, and enforce rate limits to control cost.
  • Compliance: Strip personally identifiable information (PII) before sending data to the LLM.

2. Automated Meeting Scheduler (Goal‑Based)

  • Why Goal‑Based? The task has a clear objective (schedule a meeting) and requires stateful interaction.
  • Scaling: Store calendar data in a relational database, expose the agent as a microservice, and integrate with calendar APIs (Google Calendar, Outlook).
  • Extensibility: Add conflict‑resolution logic or natural‑language date parsing (e.g., dateparser library).

3. Edge Device Controller (Reflex)

  • Why Reflex? Low latency, deterministic behavior, and minimal resource footprint.
  • Scaling: Compile the logic into a tiny binary (Cython or Rust) and flash it onto IoT devices.
  • Reliability: Include watchdog timers and fallback hard‑coded actions for safety‑critical environments.

Across all cases, monitor three key metrics: latency (time from perception to action), success rate (how often the goal is achieved), and cost (especially for LLM calls). Instrument your agent with logging and telemetry to keep these numbers in check.

Pro Tips for Building Robust AI Agents

Modularize Early. Keep sensor, perception, decision, and actuator code in separate classes or modules. This makes swapping a rule engine for an LLM a single import change.

Guard Against Hallucinations. When using LLMs, validate any factual claim (e.g., dates, URLs) against a trusted source before acting.

Version Your Prompts. Store prompt strings in version‑controlled files. Small wording changes can dramatically affect LLM behavior, and you’ll want to roll back if a regression appears.

Graceful Degradation. Design a fallback path (e.g., a rule‑based responder) for when the LLM service is unavailable or exceeds quota.

Conclusion

Building AI agents from scratch is less about mastering a single technology and more about mastering a pattern: sense, think, act. Starting with a reflex agent gives you a sandbox for the loop, moving to goal‑based agents introduces state and planning, and finally integrating LLMs unlocks sophisticated language understanding.

Whether you’re automating internal workflows, creating a friendly chatbot, or programming a robot, the modular architecture we covered will keep your code clean, testable, and ready for future upgrades. Keep iterating, instrument your loops, and let the agents evolve alongside your product.

Share this article