How to Build AI Agents That Don't Break: A Guide to Bulletproof Workflows with PydanticAI

Akram Chauhan
Akram Chauhan
12 min read113 views
How to Build AI Agents That Don't Break: A Guide to Bulletproof Workflows with PydanticAI

You’ve seen it, right? You build this amazing AI agent, and it works perfectly in your demo. It’s smart, it’s fast, it’s… well, it’s magic. Then you try to use it for a real-world task, and things get messy. It hallucinates a function call, spits out a weirdly formatted JSON, or just completely misunderstands a critical instruction.

Suddenly, your magical AI feels more like a fragile toy.

This is the gap so many of us are struggling with: the massive difference between a cool chatbot prototype and a reliable, production-ready system. The "best-effort" generation of most Large Language Models (LLMs) is fantastic for creative tasks, but it's a nightmare when you need predictable, structured output.

So, how do we fix this? How do we build AI agents we can actually trust? Today, we’re going to do just that. We'll build a robust support ticket agent using a fantastic library called PydanticAI. The secret sauce isn't a better prompt; it's about forcing the AI to play by our rules, using strict data contracts that make failures the exception, not the norm.

The Big Idea: From Unpredictable Chat to Structured Contracts

Think of it like this. You can ask a new employee to just "give you a summary of the customer issue." You might get a great paragraph, or you might get a rambling story. It's a gamble.

Or, you could hand them a very specific form with fields like "Customer Name," "Priority Level," "Issue Category," and "Detailed Description." Now, you're guaranteed to get the information you need in the exact format you expect.

That form is what Pydantic schemas do for our AI. We’re going to stop asking the AI to just talk to us and instead require it to fill out a form—a strict, typed schema. This simple shift in thinking is what turns a brittle agent into a bulletproof one.

Let’s get our hands dirty and build something real.

Step 1: Setting the Stage

First things first, we need to get our environment set up. We're building a support ticket agent that can create, update, and query tickets in a database. This requires a few libraries, most importantly pydantic-ai.

We’ll also handle the OpenAI API key. The code below is smart enough to find it in a Google Colab environment or just ask you for it if it can't.

# Let's get our tools in order
!pip -q install "pydantic-ai-slim[openai]" pydantic

import os, json, sqlite3
from dataclasses import dataclass
from datetime import datetime, timezone
from typing import Literal, Optional, List
from pydantic import BaseModel, Field, field_validator
from pydantic_ai import Agent, RunContext, ModelRetry

# Securely grab the OpenAI API key
if not os.environ.get("OPENAI_API_KEY"):
    try:
        from google.colab import userdata
        os.environ["OPENAI_API_KEY"] = (userdata.get("OPENAI_API_KEY") or "").strip()
    except Exception:
        pass

if not os.environ.get("OPENAI_API_KEY"):
    import getpass
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Paste your OPENAI_API_KEY: ").strip()

assert os.environ.get("OPENAI_API_KEY"), "You'll need an OpenAI API key for this to work."

Nothing too crazy here. We're just importing the necessary building blocks and making sure we can talk to OpenAI.

Step 2: Defining the Rules with Pydantic Schemas

This is where the magic really begins. We're going to define the "forms" our AI agent must fill out. We’ll create two main schemas: one for what a ticket looks like (TicketDraft) and one for what decision the agent makes (AgentDecision).

# Defining our strict "forms" or schemas for the AI
Priority = Literal["low", "medium", "high", "critical"]
ActionType = Literal["create_ticket", "update_ticket", "query_ticket", "list_open_tickets", "no_action"]
Confidence = Literal["low", "medium", "high"]

class TicketDraft(BaseModel):
    title: str = Field(..., min_length=8, max_length=120)
    customer: str = Field(..., min_length=2, max_length=60)
    priority: Priority
    category: Literal["billing", "bug", "feature_request", "security", "account", "other"]
    description: str = Field(..., min_length=20, max_length=1000)
    expected_outcome: str = Field(..., min_length=10, max_length=250)

class AgentDecision(BaseModel):
    action: ActionType
    reason: str = Field(..., min_length=20, max_length=400)
    confidence: Confidence
    ticket: Optional[TicketDraft] = None
    ticket_id: Optional[int] = None
    follow_up_questions: List[str] = Field(default_factory=list, max_length=5)

    @field_validator("follow_up_questions")
    @classmethod
    def short_questions(cls, v):
        for q in v:
            if len(q) > 140:
                raise ValueError("Each follow-up question must be <= 140 characters.")
        return v

Look at how clear this is!

  • We're using Literal to force fields like priority and action to be one of a few specific values. No more weird, made-up categories from the AI.
  • Field lets us add validation rules, like minimum and maximum lengths. If the AI generates a title that's too short, Pydantic will catch it immediately.
  • The AgentDecision is the master schema. Every time the agent runs, it must return an object that looks exactly like this. It has to decide on an action, state its reason and confidence, and provide the necessary data (like a TicketDraft or a ticket_id).
  • We even have a custom validator (short_questions) to enforce our own business rules.

By doing this, we've moved the responsibility of "getting it right" from the LLM's fuzzy world to Pydantic's world of cold, hard logic. If the AI's output doesn't fit this mold, it's an error we can catch and handle, not a silent failure that corrupts our data.

Step 3: Setting Up Our "Real World" Dependencies

Our agent needs to interact with things, like a database. But we don't want to hardcode a database connection inside our agent logic. That would be a testing and maintenance nightmare.

Instead, we'll use a simple approach called dependency injection. We’ll create a small container for all the "live" resources our agent needs and pass it in when we run the agent.

# Setting up the things our agent will interact with
@dataclass
class SupportDeps:
    db: sqlite3.Connection
    tenant: str
    policy: dict

def utc_now_iso() -> str:
    return datetime.now(timezone.utc).isoformat()

def init_db() -> sqlite3.Connection:
    conn = sqlite3.connect(":memory:", check_same_thread=False)
    conn.execute("""
    CREATE TABLE tickets (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        tenant TEXT NOT NULL, title TEXT NOT NULL, customer TEXT NOT NULL,
        priority TEXT NOT NULL, category TEXT NOT NULL, description TEXT NOT NULL,
        expected_outcome TEXT NOT NULL, status TEXT NOT NULL,
        created_at TEXT NOT NULL, updated_at TEXT NOT NULL
    );
    """)
    conn.commit()
    return conn

def seed_ticket(db: sqlite3.Connection, tenant: str, ticket: TicketDraft, status: str = "open") -> int:
    now = utc_now_iso()
    cur = db.execute(
        """
        INSERT INTO tickets (tenant, title, customer, priority, category, description, expected_outcome, status, created_at, updated_at)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        """,
        (
            tenant, ticket.title, ticket.customer, ticket.priority, ticket.category,
            ticket.description, ticket.expected_outcome, status, now, now,
        ),
    )
    db.commit()
    return int(cur.lastrowid)

Here, we're creating a simple in-memory SQLite database to act as our external system. The SupportDeps class holds the database connection, the current tenant (so we can handle multi-tenancy), and a policy dictionary for business rules. This keeps our agent's logic clean and separated from the environment it runs in.

Step 4: Building the Agent and Its Tools

Now we assemble the agent itself. We’ll give it a main instruction prompt, tell it that its output must match our AgentDecision schema, and then give it a set of "tools" it can use to interact with the database.

These tools are just Python functions decorated with @agent.tool. PydanticAI handles all the complicated work of letting the LLM know these tools exist and how to call them.

def build_agent(model_name: str) -> Agent[SupportDeps, AgentDecision]:
    agent = Agent(
        f"openai:{model_name}",
        output_type=AgentDecision,
        output_retries=2,
        instructions=(
            "You are a production support triage agent.\n"
            "Return an output that matches the AgentDecision schema.\n"
            "Use tools when you need DB state.\n"
            "Never invent ticket IDs.\n"
            "If the user intent is unclear, ask concise follow-up questions.\n"
        ),
    )

    @agent.tool
    def create_ticket(ctx: RunContext[SupportDeps], ticket: TicketDraft) -> int:
        deps = ctx.deps
        # Example of a business rule check inside a tool
        if ticket.priority in ("critical", "high") and deps.policy.get("require_security_phrase_for_critical", False):
            if ticket.category == "security" and "incident" not in ticket.description.lower():
                raise ModelRetry("For security high/critical, include the word 'incident' in description and retry.")
        return seed_ticket(deps.db, deps.tenant, ticket, status="open")

    @agent.tool
    def update_ticket_status(ctx: RunContext[SupportDeps], ticket_id: int, status: Literal["open", "in_progress", "resolved", "closed"]) -> dict:
        # ... (implementation for updating a ticket)
        deps = ctx.deps
        now = utc_now_iso()
        cur = deps.db.execute("SELECT id FROM tickets WHERE tenant=? AND id=?", (deps.tenant, ticket_id))
        if not cur.fetchone():
            raise ModelRetry(f"Ticket {ticket_id} not found for this tenant. Ask for the correct ticket_id.")
        deps.db.execute(
            "UPDATE tickets SET status=?, updated_at=? WHERE tenant=? AND id=?",
            (status, now, deps.tenant, ticket_id),
        )
        deps.db.commit()
        return {"ticket_id": ticket_id, "status": status, "updated_at": now}

    @agent.tool
    def query_ticket(ctx: RunContext[SupportDeps], ticket_id: int) -> dict:
        # ... (implementation for querying a ticket)
        deps = ctx.deps
        cur = deps.db.execute(
            "SELECT id, title, customer, priority, category, status, created_at, updated_at FROM tickets WHERE tenant=? AND id=?",
            (deps.tenant, ticket_id),
        )
        row = cur.fetchone()
        if not row:
            raise ModelRetry(f"Ticket {ticket_id} not found. Ask the user for a valid ticket_id.")
        keys = ["id", "title", "customer", "priority", "category", "status", "created_at", "updated_at"]
        return dict(zip(keys, row))

    @agent.tool
    def list_open_tickets(ctx: RunContext[SupportDeps], limit: int = 5) -> list:
        # ... (implementation for listing tickets)
        deps = ctx.deps
        limit = max(1, min(int(limit), 20)) # Safety first
        cur = deps.db.execute(
            "SELECT id, title, priority, category, status, updated_at FROM tickets WHERE tenant=? AND status IN ('open','in_progress') ORDER BY updated_at DESC LIMIT ?",
            (deps.tenant, limit),
        )
        rows = cur.fetchall()
        return [{"id": r[0], "title": r[1], "priority": r[2], "category": r[3], "status": r[4], "updated_at": r[5]} for r in rows]

    @agent.output_validator
    def validate_decision(ctx: RunContext[SupportDeps], out: AgentDecision) -> AgentDecision:
        deps = ctx.deps
        if out.action == "create_ticket" and out.ticket is None:
            raise ModelRetry("You chose create_ticket but did not provide ticket. Provide ticket fields and retry.")
        if out.action in ("update_ticket", "query_ticket") and out.ticket_id is None:
            raise ModelRetry("You chose update/query but did not provide ticket_id. Ask for ticket_id and retry.")
        if out.ticket and out.ticket.priority == "critical" and not deps.policy.get("allow_critical", True):
            raise ModelRetry("This tenant does not allow 'critical'. Downgrade to 'high' and retry.")
        return out

    return agent

This is the brain of our operation.

  • The Agent is initialized with the model we want to use (e.g., "openai:gpt-4o-mini") and, crucially, our AgentDecision as the output_type.
  • Each tool is a simple Python function. Notice how they take ctx: RunContext[SupportDeps] as an argument. This is how they get access to our dependencies (like ctx.deps.db).
  • The coolest part is raise ModelRetry(...). If the agent makes a mistake (like trying to query a ticket that doesn't exist), the tool can raise this special exception. This tells the agent, "Hey, that didn't work. Here's why. Try again." This allows the agent to self-correct!
  • Finally, the @agent.output_validator is a last line of defense. It runs after the LLM has made its final decision, allowing us to enforce high-level business logic (like checking if a tenant is allowed to create "critical" tickets).

Step 5: Let's See It in Action!

Okay, theory is great, but let's see this thing run. We'll initialize our database, seed it with a couple of tickets, and then throw a few real-world prompts at our agent.

# Let's fire it up!
db = init_db()
deps = SupportDeps(
    db=db,
    tenant="acme_corp",
    policy={"allow_critical": True, "require_security_phrase_for_critical": True},
)

# Add some dummy data to our database
seed_ticket(
    db, deps.tenant,
    TicketDraft(
        title="Double-charged on invoice 8831", customer="Riya", priority="high", category="billing",
        description="Customer reports they were billed twice for invoice 8831 and wants a refund and confirmation email.",
        expected_outcome="Issue a refund and confirm resolution to customer.",
    ),
)
seed_ticket(
    db, deps.tenant,
    TicketDraft(
        title="App crashes on login after update", customer="Sam", priority="high", category="bug",
        description="After latest update, the app crashes immediately on login. Reproducible on two devices; needs investigation.",
        expected_outcome="Provide a fix or workaround and restore successful logins.",
    ),
)

agent = build_agent("gpt-4o-mini")

async def run_case(prompt: str):
    print(f"\n--- Running Case: {prompt} ---\n")
    res = await agent.run(prompt, deps=deps)
    out = res.output
    print(json.dumps(out.model_dump(), indent=2))
    return out

# Case A: Creating a critical security ticket
await run_case(
    "We suspect account takeover: multiple password reset emails and unauthorized logins. "
    "Customer=Leila. Priority=critical. Open a security ticket."
)

# Case B: Listing and summarizing open tickets
await run_case("List our open tickets and summarize what to tackle first.")

# Case C: Querying and then updating a ticket
await run_case("What is the status of ticket 1? If it's open, move it to in_progress.")

# And look how easy it is to swap the model!
print("\n--- Swapping to a more powerful model (gpt-4o) ---")
agent_alt = build_agent("gpt-4o")
alt_res = await agent_alt.run(
    "Create a feature request ticket: customer=Noah wants 'export to CSV' in analytics dashboard; priority=medium.",
    deps=deps,
)
print(json.dumps(alt_res.output.model_dump(), indent=2))

When you run this, you'll see clean, structured JSON output for every case. The agent correctly identifies the intent, uses the right tools, and fills out the AgentDecision schema perfectly. And notice how simple it is to swap gpt-4o-mini for gpt-4o. All our logic, tools, and schemas remain the same. That’s the power of building model-agnostic systems.

This is How We Build AI We Can Trust

What we’ve built here is more than just a clever demo. It's a blueprint for creating dependable AI systems. By shifting from prompting for natural language to requiring structured, validated outputs, we close the reliability gap.

We let the LLM do what it's good at—understanding messy human language—but we force its final output into a rigid structure that our application code can trust. The combination of strict schemas, injectable tools, and self-correction through retries gives us a powerful framework for building real, enterprise-grade agentic workflows.

So next time you're frustrated with an unpredictable AI, don't just tweak the prompt. Change the rules of the game. Make it fill out a form. You'll be amazed at how much more reliable it becomes.

Tags

AI MLOps AI agents AI Output Control LLM Reliability Production AI AI hallucination AI System Development PydanticAI Agentic Workflows LLM engineering reliable AI systems tool injection strict schemas model-agnostic execution

Stay Updated

Get the latest articles and insights delivered straight to your inbox.

We respect your privacy. Unsubscribe at any time.

Aicosoft

AI & Technology News, Insights & Innovation

AICOSOFT delivers cutting-edge AI news, technology breakthroughs, and innovation insights. Stay informed about artificial intelligence, machine learning, robotics, and the latest tech trends shaping tomorrow.

Connect With Us

© 2026 Aicosoft. All rights reserved.