Let’s be honest for a second. We’ve all been there. You spend hours crafting the perfect prompt, begging the LLM to return a clean, structured JSON object. You specify every field, every data type, every possible value. You hit "run," and what do you get back?
Maybe it’s a beautifully written paragraph explaining the JSON you wanted. Or maybe it's JSON that’s almost right, but it hallucinated a new field or used a string where you needed an integer. It's frustrating, and for any serious application, it's a total non-starter. You can't build a reliable system on a foundation of "pretty please, format this correctly."
This is the gap between cool AI demos and real, enterprise-grade AI that can make important decisions. If you're building a system to analyze risk, approve transactions, or ensure policy compliance, "mostly right" is the same as "wrong."
So, what if we stopped asking nicely and started demanding? What if, instead of suggesting an output format, we handed our AI a legally-binding contract it had to follow? That's the core idea behind a "contract-first" approach, and with a brilliant little library called PydanticAI, it’s surprisingly easy to do.
What Does "Contract-First" Actually Mean?
Think of it like hiring a developer. You wouldn't just say, "Hey, build me an app." You'd give them a detailed Statement of Work (SOW). It would define the features, the tech stack, the deliverables, the deadlines. It's a non-negotiable contract that sets the rules of engagement.
In our world, that "contract" is a Pydantic schema.
Instead of the AI’s output being an optional guideline, the schema becomes a rigid, unchangeable rulebook. If the AI generates an output that violates the contract, it's automatically rejected. No exceptions. This simple shift in thinking changes everything. It moves us from building unpredictable text generators to engineering reliable, auditable decision-making systems.
Step 1: Writing the AI's "Job Description" with Pydantic
First things first, we need to define what a valid decision even looks like. We're not just going to list a few fields; we're going to build a comprehensive model that encodes our business logic right into its structure.
Let's imagine we're building an agent to approve or reject a new software deployment. Here’s what our contract, defined in Pydantic, might look like.
import time
from typing import List, Literal
from pydantic import BaseModel, Field, field_validator
class RiskItem(BaseModel):
risk: str = Field(..., min_length=8)
severity: Literal["low", "medium", "high"]
mitigation: str = Field(..., min_length=12)
class DecisionOutput(BaseModel):
decision: Literal["approve", "approve_with_conditions", "reject"]
confidence: float = Field(..., ge=0.0, le=1.0)
rationale: str = Field(..., min_length=80)
identified_risks: List[RiskItem] = Field(..., min_length=2)
compliance_passed: bool
conditions: List[str] = Field(default_factory=list)
next_steps: List[str] = Field(..., min_length=3)
timestamp_unix: int = Field(default_factory=lambda: int(time.time()))
See what we're doing here? We’re being incredibly specific.
decisioncan only be one of three strings. No creative alternatives allowed.confidencemust be a float between 0.0 and 1.0.rationalecan't be a lazy one-liner; it needs to be at least 80 characters.identified_risksisn't just a list; it's a list ofRiskItemobjects, and there must be at least two of them. Each risk itself has a strict structure.
This is our foundation. Any output that doesn't perfectly match this blueprint is invalid from the start. But we can go even deeper.
Step 2: Baking Business Rules Directly into the Contract
This is where the real magic happens. Pydantic allows us to add custom "validators" that check the logic between fields, not just their individual structure. We’re essentially teaching the data model our company’s policies.
Let’s add a few rules to our DecisionOutput model.
class DecisionOutput(BaseModel):
# ... (all the fields from before) ...
@field_validator("confidence")
@classmethod
def confidence_vs_risk(cls, v, info):
risks = info.data.get("identified_risks") or []
if any(r.severity == "high" for r in risks) and v > 0.70:
raise ValueError("confidence too high given high-severity risks")
return v
@field_validator("decision")
@classmethod
def reject_if_non_compliant(cls, v, info):
if info.data.get("compliance_passed") is False and v != "reject":
raise ValueError("non-compliant decisions must be reject")
return v
@field_validator("conditions")
@classmethod
def conditions_required_for_conditional_approval(cls, v, info):
d = info.data.get("decision")
if d == "approve_with_conditions" and (not v or len(v) < 2):
raise ValueError("approve_with_conditions requires at least 2 conditions")
if d == "approve" and v:
raise ValueError("approve must not include conditions")
return v
Look at what we just did. We’ve encoded some serious business intelligence:
- Confidence vs. Risk: If the AI identifies any "high" severity risks, it's not allowed to be overly confident (greater than 70%). This prevents a dangerously optimistic output.
- Compliance is King: If the
compliance_passedflag isFalse, thedecisionmust bereject. It’s a hard and fast rule. The AI literally cannot produce an "approve" decision for a non-compliant request. - Conditions for Approval: If the decision is
approve_with_conditions, then there must be at least two conditions listed. And if it's a straightapprove, there can't be any conditions.
These aren't suggestions in a prompt. They are unbreakable laws built into the very structure of the output. If the LLM tries to generate something that violates these rules, Pydantic will raise an error, and PydanticAI will automatically tell the model to try again with feedback on what it did wrong.
Step 3: Setting Up Our Policy-Aware Agent
Now that we have our contract, let's hire our agent and give it its instructions. We'll use PydanticAI to connect our DecisionOutput contract to an OpenAI model.
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.openai import OpenAIProvider
# ... (Assume OPENAI_API_KEY is set) ...
model = OpenAIChatModel(
"gpt-4-turbo", # Or your model of choice
provider=OpenAIProvider(api_key=OPENAI_API_KEY),
)
agent = Agent(
model=model,
output_type=DecisionOutput,
system_prompt="""
You are a corporate decision analysis agent.
You must evaluate risk, compliance, and uncertainty.
All outputs must strictly satisfy the DecisionOutput schema.
"""
)
This is pretty straightforward. We're creating an Agent and telling it two crucial things:
- Your only job is to produce outputs that match the
DecisionOutputtype. - Your persona is a corporate decision analyst.
The agent now knows its role and, more importantly, the exact format and rules its final work must adhere to.
Step 4: Adding a Final "Governance" Check
Sometimes, there are rules that are too complex or context-dependent to put directly into the Pydantic model. For these, PydanticAI gives us another layer of defense: output validators. These are functions that run after the AI has generated a valid Pydantic object but before it's returned to us.
Think of it as a final quality assurance check.
@agent.output_validator
def ensure_risk_quality(result: DecisionOutput) -> DecisionOutput:
if not any(r.severity in ("medium", "high") for r in result.identified_risks):
raise ValueError("at least one medium or high risk required")
return result
@agent.output_validator
def enforce_policy_controls(result: DecisionOutput) -> DecisionOutput:
# Let's pretend we pass the company policy into the agent's context
# For this example, we'll check for keywords
text = (
result.rationale + " ".join(result.next_steps) + " ".join(result.conditions)
).lower()
if result.compliance_passed:
if not any(k in text for k in ["encryption", "audit", "logging", "access control"]):
raise ValueError("missing concrete security controls in rationale/steps")
return result
Here, we're adding two more governance layers:
- No Fluffy Risks: The agent can't just list a bunch of "low" severity risks to meet the minimum count. It must identify at least one "medium" or "high" risk, forcing it to think critically.
- Show Your Work: If the agent claims the proposal is compliant, it needs to back that up by mentioning specific security controls (like "encryption" or "logging") in its reasoning.
If either of these checks fails, PydanticAI will, again, send the output back to the LLM with the error message and ask it to self-correct.
Okay, Let's Put Our AI to the Test
We've built our contract and hired our agent. Now, let's give it a real-world scenario and see how it performs.
Here’s our prompt:
company_policy = (
"No deployment of systems handling personal data or transaction metadata "
"without encryption, audit logging, and least-privilege access control."
)
prompt = """
Decision request: Deploy an AI-powered customer analytics dashboard using a
third-party cloud vendor. The system processes user behavior and transaction
metadata. Audit logging is not implemented and customer-managed keys are uncertain.
"""
# We would run this in an async function
# result = await agent.run(prompt, deps={"company_policy": company_policy})
# decision = result.output
Based on this prompt, a standard LLM might give a wishy-washy answer. But our contract-first agent is forced to make a concrete, compliant decision. Given the lack of audit logging, it knows that compliance_passed must be False. And because of our validator, if compliance_passed is False, the decision must be reject.
The final output won't be a paragraph of text. It will be a perfectly structured DecisionOutput object that you can trust, log, and feed directly into other automated systems. It might look something like this (in dictionary form):
{
"decision": "reject",
"confidence": 0.95,
"rationale": "The proposed deployment cannot be approved as it explicitly violates company policy. The system handles sensitive transaction metadata but lacks implementation of audit logging, which is a non-negotiable security requirement. The uncertainty around customer-managed keys further elevates the data governance risk.",
"identified_risks": [
{
"risk": "Non-compliance with mandatory company data security policy.",
"severity": "high",
"mitigation": "Implement a comprehensive audit logging solution before resubmitting the proposal."
},
{
"risk": "Potential for unauthorized data access due to unclear key management.",
"severity": "medium",
"mitigation": "Clarify and confirm that customer-managed encryption keys will be used for all data at rest."
}
],
"compliance_passed": false,
"conditions": [],
"next_steps": [
"Halt the deployment project immediately.",
"Engage the security team to define requirements for audit logging.",
"Re-evaluate the third-party vendor's data encryption and key management capabilities."
],
"timestamp_unix": 1678886400
}
This output is beautiful. It's not just structured; it's logical, compliant, and safe. It followed every single rule we laid out in our contract. It's an asset you can build a business process on top of.
The Future is Reliable AI, Not Just Creative AI
This is what it looks like to move AI out of the playground and into production. By treating our schemas as non-negotiable contracts, we force the AI to reason within the constraints of our business. It has to think about risk, policy, and logic, not just stringing words together.
This approach gives us systems that fail safely, self-correct when they make a mistake, and produce auditable, trustworthy outputs. So next time you find yourself begging an LLM for the right format, take a step back. Maybe it's time to stop asking and start requiring. It's time to give your AI a contract.




