Building a Smarter AI Agent: A Step-by-Step Guide with Planning, Tools, and Self-Critique

Akram Chauhan
Akram Chauhan
14 min read49 views
Building a Smarter AI Agent: A Step-by-Step Guide with Planning, Tools, and Self-Critique

Let's be honest. Most of us have played with AI chatbots that feel… well, a bit one-dimensional. You ask a question, it gives an answer. It’s cool, but it’s not really doing anything. It’s a conversation partner, not a work partner.

But what if we could build an AI that’s more than just a talker? An AI that can think, plan, use tools, and even check its own work before handing it over. An AI that can take a complex goal, break it down, and actually produce a deliverable, like writing a file or extracting specific data.

That’s exactly what we’re going to do today. We're moving beyond the simple question-and-answer model and into the world of "agentic AI." Think of it as giving your AI a brain, a toolbox, and a sense of quality control.

The Game Plan: An AI "Assembly Line"

The secret sauce here isn't just one giant, monolithic prompt. I’ve found that the best approach is to break the task down, just like a real-world team would.

We’re going to build our agent as a small pipeline of specialized "roles":

  1. The Planner: This is the strategist. It looks at your goal and creates a step-by-step plan.
  2. The Executor: This is the worker bee. It takes the plan and executes it, using tools whenever it needs to.
  3. The Critic: This is our quality assurance. It reviews the Executor’s work, points out flaws, and polishes it into a final, high-quality answer.

By separating these roles, we get a system that’s more reliable, easier to debug, and capable of tackling much more complex tasks. Ready to get started? Let's dive in.

Step 1: Setting Up Our Workshop

First things first, we need to get our environment ready. We’ll be using the OpenAI Python library. If you don't have it, a quick pip install will do the trick.

We'll import just the libraries we need to keep things clean. And to keep our API key safe, we'll use getpass(), which creates a hidden prompt. This way, your key never accidentally shows up in your code or notebook output.

# Let's get the essentials installed and imported
!pip -q install -U openai

import os
import json
import re
import math
import hashlib
from dataclasses import dataclass, field
from typing import Any, Dict, List
from getpass import getpass
from openai import OpenAI

# Prompt for the API key if it's not already set
if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass("Enter OPENAI_API_KEY (hidden): ").strip()

assert os.environ["OPENAI_API_KEY"], "You'll need an OpenAI API key to continue!"

# Set up our client and define the model we'll use
client = OpenAI()
MODEL = "gpt-4o" # Or whichever model you prefer

Simple enough, right? We've got our OpenAI client ready and a MODEL variable we can reuse everywhere to ensure consistency.

Step 2: Giving Our Agent Some Tools

An agent is only as good as its tools. We need to give our AI the ability to do things beyond just generating text. For this project, we'll give it four useful tools:

  • A safe calculator.
  • A mini knowledge-base search.
  • A tool to extract structured JSON from text.
  • A tool to write files to disk.

First, let's create a tiny "knowledge base." In a real-world scenario, this could be your team's documentation, project guidelines, or standard operating procedures. For us, it’s just a simple list of dictionaries.

# Our internal "playbook" for the agent
KB = [
    {
        "title": "Agent Protocol: Execution",
        "text": "Use tools only when necessary. Prefer short intermediate notes. Always verify numeric results."
    },
    {
        "title": "Policy: Output Quality",
        "text": "Final answers must include steps, checks, and deliverables. Emails must include subject and next steps."
    },
    {
        "title": "Playbook: Meeting Follow-up",
        "text": "Summarize decisions. List action items with owner and due date. Draft concise follow-up."
    },
]

Now, let's build the Python functions for our tools. Notice how each function returns a dictionary. This is key—it keeps the tool outputs structured and predictable, which is exactly what a machine needs.

# A safe calculator - never use raw eval() on untrusted input!
def _safe_calc(expr: str):
    allowed = set("0123456789+-*/().% eE")
    if any(ch not in allowed for ch in expr):
        return {"ok": False, "error": "Invalid characters"}
    if re.search(r"[A-Za-z_]", expr):
        return {"ok": False, "error": "Variables not allowed"}
    try:
        val = eval(expr, {"__builtins__": {}}, {"math": math})
        return {"ok": True, "expression": expr, "value": val}
    except Exception as e:
        return {"ok": False, "error": str(e)}

# A simple keyword search for our knowledge base
def _kb_search(query: str, k: int = 3):
    q = query.lower()
    scored = []
    for item in KB:
        hay = (item["title"] + " " + item["text"]).lower()
        score = sum(1 for tok in set(re.findall(r"\w+", q)) if tok in hay)
        scored.append((score, item))
    scored.sort(key=lambda x: x[0], reverse=True)
    return {"ok": True, "results": [it for _, it in scored[:k]]}

# A tool to pull JSON out of messy text
def _extract_json(text: str):
    # This regex looks for the first { ... } block
    m = re.search(r"\{.*\}", text, flags=re.DOTALL)
    if not m:
        return {"ok": False, "error": "No JSON found"}
    try:
        return {"ok": True, "json": json.loads(m.group(0))}
    except Exception as e:
        return {"ok": False, "error": str(e), "raw": m.group(0)[:1500]}

# A tool to save our final work
def _write_file(path: str, content: str):
    os.makedirs(os.path.dirname(path) or ".", exist_ok=True)
    with open(path, "w", encoding="utf-8") as f:
        f.write(content)
    sha = hashlib.sha256(content.encode()).hexdigest()[:16]
    return {"ok": True, "path": path, "sha16": sha, "bytes": len(content.encode("utf-8"))}

We've got a calculator that's locked down for safety, a search function for our KB, a JSON extractor, and a file writer. These are the building blocks of an agent that can interact with the world.

Step 3: Teaching the AI How to Use Its Tools

Okay, so we have Python functions. How does the AI know they exist, what they do, or how to call them?

This is where "tool schemas" come in. We need to describe our tools in a specific JSON format that the OpenAI API understands. Think of it like writing a little user manual for the AI for each tool.

We also need a way to manage the agent's state—its goal, its memory of what it has done, and a log of its actions. A simple dataclass is perfect for this.

# First, let's map tool names to our functions
TOOLS = {
    "calc": lambda expression: _safe_calc(expression),
    "kb_search": lambda query, k=3: _kb_search(query, int(k)),
    "extract_json": lambda text: _extract_json(text),
    "write_file": lambda path, content: _write_file(path, content),
}

# Now, the "user manuals" for the AI
TOOL_SCHEMAS = [
    {"type": "function", "function": {"name": "calc", "description": "Safely compute a numeric expression.", "parameters": {"type": "object", "properties": {"expression": {"type": "string"}}, "required": ["expression"]}}},
    {"type": "function", "function": {"name": "kb_search", "description": "Search internal mini knowledge base.", "parameters": {"type": "object", "properties": {"query": {"type": "string"}, "k": {"type": "integer", "default": 3}}, "required": ["query"]}}},
    {"type": "function", "function": {"name": "extract_json", "description": "Extract and parse first JSON object from text.", "parameters": {"type": "object", "properties": {"text": {"type": "string"}}, "required": ["text"]}}},
    {"type": "function", "function": {"name": "write_file", "description": "Write content to a file path.", "parameters": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}}},
]

# The agent's "notepad" to keep track of everything
@dataclass
class AgentState:
    goal: str
    memory: List[str] = field(default_factory=list)
    trace: List[Dict[str, Any]] = field(default_factory=list)

# A little helper function to make API calls cleaner
def chat(messages, tools=None, tool_choice="auto", temperature=0.2):
    kwargs = dict(model=MODEL, messages=messages, temperature=temperature)
    if tools is not None:
        kwargs["tools"] = tools
        kwargs["tool_choice"] = tool_choice
    return client.chat.completions.create(**kwargs)

# And a function to safely run our tools by name
def run_tool(name, args):
    fn = TOOLS.get(name)
    if not fn:
        return {"ok": False, "error": f"Unknown tool: {name}"}
    try:
        return fn(**args)
    except Exception as e:
        return {"ok": False, "error": str(e), "args": args}

This part is super important. We've registered our tools, described them to the AI, and created a state object to track the agent’s progress. The trace list in our AgentState is especially handy—it’s a complete log of every tool call, which is a lifesaver for debugging when things go wrong.

Step 4: Creating the "Mind" of the Agent

Now for the fun part: defining the personalities of our three specialists. We do this with system prompts. Each one gives a clear directive to the model, telling it what its job is.

PLANNER_SYS = """You are a senior planner. Your job is to create a clear, step-by-step plan to achieve the user's goal. Return your plan as a STRICT JSON object with three keys: 'objective' (a brief restatement of the goal), 'steps' (an array of strings for each action), and 'tool_checkpoints' (an array of strings describing when a tool might be needed)."""

EXECUTOR_SYS = """You are a tool-using executor. Your job is to follow the plan and achieve the goal. Use your tools whenever necessary. Keep your intermediate notes and thoughts short and to the point. When you believe you are finished, present your work by providing: 1) The DRAFT of the final output, and 2) A verification checklist to confirm you met all requirements."""

CRITIC_SYS = """You are a critic. Your job is to improve a draft. You will be given a goal and a draft. Review the draft against the goal. Provide your feedback as a list of issues and a list of suggested fixes. Then, provide the final, improved answer. Your output should be clean and directly address the user's goal."""

With these roles defined, let's build the functions that bring them to life.

The Planner: The Strategist

The plan function sends the user's goal to the Planner model, which then returns a structured JSON plan.

def plan(state: AgentState):
    r = chat(
        [{"role": "system", "content": PLANNER_SYS}, {"role": "user", "content": state.goal}],
        tools=None,
        temperature=0.1,
    )
    txt = r.choices[0].message.content or ""
    
    # We use our own JSON extractor tool here to parse the plan
    parsed = _extract_json(txt)
    if not parsed.get("ok"):
        # If the planner messes up the JSON, we just create a simple fallback plan
        return {"objective": state.goal, "steps": ["Proceed directly (planner JSON parse failed)."], "tool_checkpoints": []}
    
    return parsed["json"]

The Executor: The Doer

This is the heart of our agent. The execute function runs in a loop. In each step, it talks to the model. If the model wants to use a tool, we run the tool in Python and feed the result back to the model. This back-and-forth continues until the model decides it's done and gives us a final draft.

def execute(state: AgentState, plan_obj: Dict[str, Any]):
    # We'll give the model the goal, the plan, and its recent memory
    msgs = [
        {"role": "system", "content": EXECUTOR_SYS},
        {"role": "user", "content": f"GOAL:\n{state.goal}\n\nPLAN:\n{json.dumps(plan_obj, indent=2)}\n\nMEMORY:\n" + "\n".join(f"- {m}" for m in state.memory[-10:])}
    ]

    # Let's limit the loop to prevent it from running forever
    for _ in range(12):
        r = chat(msgs, tools=TOOL_SCHEMAS, tool_choice="auto", temperature=0.2)
        msg = r.choices[0].message
        
        tool_calls = getattr(msg, "tool_calls", None)

        if tool_calls:
            # The model wants to use a tool!
            msgs.append({"role": "assistant", "content": msg.content or "", "tool_calls": tool_calls})
            for tc in tool_calls:
                name = tc.function.name
                args = json.loads(tc.function.arguments or "{}")
                
                # Run the tool and log it to our trace
                out = run_tool(name, args)
                state.trace.append({"tool": name, "args": args, "out": out})
                
                # Send the tool's output back to the model
                msgs.append({"role": "tool", "tool_call_id": tc.id, "content": json.dumps(out)})
            continue # Go back to the model for the next step
        
        # If there are no tool calls, the model is done. This is our draft.
        return msg.content or ""

    return "Executor stopped (iteration limit reached)."

This loop is what makes it a true agent. It’s not just generating text; it’s in a conversation with itself, using tools to gather information and take action.

The Critic: The Quality Control

Finally, the critique function takes the Executor's draft and has one last look. It's a simple but powerful step to catch errors, improve formatting, and ensure the final output is top-notch.

def critique(state: AgentState, draft: str):
    r = chat(
        [
            {"role": "system", "content": CRITIC_SYS},
            {"role": "user", "content": f"GOAL:\n{state.goal}\n\nDRAFT:\n{draft}\n\nTRACE:\n{json.dumps(state.trace, indent=2)[:9000]}"}
        ],
        tools=None,
        temperature=0.2,
    )
    return r.choices[0].message.content or draft

Putting It All Together: The Full Agentic Loop

Now we just need one main function to orchestrate the whole process: plan, execute, and critique.

Let's give it a realistic task. We'll provide a messy meeting transcript and ask the agent to summarize it, extract action items into a clean JSON format, draft a follow-up email, and save the whole thing to a file. This is a task a human would do, and it requires multiple steps and tools.

def run_agent(goal: str):
    state = AgentState(goal=goal)
    # Give the agent a little hint to start
    state.memory.append("Use kb_search if you need internal guidance or formatting playbooks.")
    
    # 1. Plan
    plan_obj = plan(state)
    
    # 2. Execute
    draft = execute(state, plan_obj)
    
    # 3. Critique
    final = critique(state, draft)
    
    return {"plan": plan_obj, "draft": draft, "final": final, "trace": state.trace}

# Here's our demo task
demo_goal = """
From this transcript, produce:
A) a concise meeting summary
B) action items as a JSON array with fields: owner, action, due_date (or null)
C) a follow-up email (with subject and body)
D) Save the complete output to a file named /content/meeting_followup.md using the write_file tool.

Transcript:
- Decision: We're going to ship the v2 dashboard on March 15. That's final.
- Risk: Priya mentioned that data latency might spike. She will run some load tests to be sure.
- Amir needs to update the KPI definitions doc and then share it with the finance team.
- Next check-in is scheduled for this coming Tuesday. Nikhil is the owner for that meeting.
"""

# Let's run it!
result = run_agent(demo_goal)

# And print the final, polished output
print(result["final"])

When you run this, you'll see the agent think through the problem. The trace will show it calling kb_search for guidance, extract_json to format the action items, and finally write_file to save its work. The final output from the Critic will be a clean, well-formatted markdown file ready to go.

So, What Did We Actually Build?

And there you have it. You've just walked through the blueprint for a much smarter, more capable AI system. We didn't just build a chatbot; we built an agent that can:

  • Strategize: It creates a plan before it starts working.
  • Act: It uses real tools to compute, search, and create files.
  • Reflect: It critiques its own work to improve the quality.
  • Be Accountable: We have a full trace of its actions for easy debugging.

This modular, role-based architecture is incredibly powerful. From here, you can make it even more advanced. You could add more tools, implement retry policies for when tools fail, or even have agents that delegate tasks to other sub-agents.

The next time you have a complex, multi-step task, think about whether a simple chatbot is the right tool, or if you need a true agent to get the job done. Now you have the foundation to build one.

Tags

AI LLMs Agentic AI AI Engineering Prompt Engineering Tool Calling AI Memory AI Productivity AI development AI architecture AI Planning Autonomous AI Agents Advanced AI Python AI OpenAI API Self-Critique AI Build AI System

Stay Updated

Get the latest articles and insights delivered straight to your inbox.

We respect your privacy. Unsubscribe at any time.

Aicosoft

AI & Technology News, Insights & Innovation

AICOSOFT delivers cutting-edge AI news, technology breakthroughs, and innovation insights. Stay informed about artificial intelligence, machine learning, robotics, and the latest tech trends shaping tomorrow.

Connect With Us

© 2026 Aicosoft. All rights reserved.