Let’s be honest. If you’ve worked with Large Language Models (LLMs) for more than ten minutes, you’ve felt the pain. You ask for a simple JSON object, and you get back… something else. A missing bracket, a trailing comma, a string where you needed an integer. It’s a mess.
You end up writing brittle parsers and a mountain of try-except blocks, praying the model behaves this time. It feels less like engineering and more like haggling with a brilliant but frustratingly chaotic intern.
What if you could just… stop all that? What if you could force the LLM to give you exactly what you want, every single time? Not just well-formed JSON, but JSON that perfectly matches a strict schema you define.
Well, you can. Today, I’m going to walk you through how to do just that using two incredible Python libraries: Outlines and Pydantic. Think of Outlines as the guardrails that keep the LLM on the right path, and Pydantic as the blueprint for the final destination. Together, they let us build rock-solid, production-ready AI workflows.
First Things First: Getting Our Tools Ready
Before we start building, we need to set up our workshop. We'll install a few libraries and get a model running. I’m using a smaller model here (SmolLM2-135M-Instruct) because it’s fast and easy to run, even without a monster GPU. The principles we're covering work just the same on bigger models like Llama or Mixtral.
Here’s the initial setup code. It handles installing the packages, loading the model and tokenizer, and setting up the device (it’ll smartly use your GPU if you have one).
import os, sys, subprocess, json, textwrap, re
# Install our dependencies quietly
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "outlines", "transformers", "accelerate", "sentencepiece", "pydantic"])
import torch
import outlines
from transformers import AutoTokenizer, AutoModelForCausalLM
from typing import Literal, List, Union, Annotated
from pydantic import BaseModel, Field
from enum import Enum
# Check our setup
print("Torch:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
print("Outlines:", getattr(outlines, "__version__", "unknown"))
# Set the device
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using device:", device)
# Load the model and tokenizer
MODEL_NAME = "HuggingFaceTB/SmolLM2-135M-Instruct"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)
hf_model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
torch_dtype=torch.float16 if device == "cuda" else torch.float32,
device_map="auto" if device == "cuda" else None,
)
if device == "cpu":
hf_model = hf_model.to(device)
# This is where Outlines works its magic, wrapping our Hugging Face model
model = outlines.from_transformers(hf_model, tokenizer)
I've also got a couple of little helper functions. build_chat just formats our prompts correctly for the model, and banner prints nice-looking section titles to keep our output clean.
def build_chat(user_text: str, system_text: str = "You are a precise assistant. Follow instructions exactly.") -> str:
"""A simple helper to format prompts for the model."""
try:
msgs = [{"role": "system", "content": system_text}, {"role": "user", "content": user_text}]
return tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
except Exception:
# Fallback for models without a chat template
return f"{system_text}\n\nUser: {user_text}\nAssistant:"
def banner(title: str):
"""Prints a clean banner to the console."""
print("\n" + "=" * 90)
print(title)
print("=" * 90)
Alright, setup is done. Now for the fun part.
Starting Simple: Forcing the LLM to Follow Basic Rules
Before we build complex structures, let's teach the LLM some basic discipline. We're going to ask it for a specific type of answer and see how Outlines forces it to comply. No more "The answer is Positive" when all you wanted was the word "Positive".
This is where Outlines starts to shine. Instead of just passing a text prompt to the model, we pass the prompt and the type of output we expect.
banner("2) Typed outputs (Literal / int / bool)")
# Example 1: Constraining to a list of choices
sentiment = model(
build_chat("Analyze the sentiment: 'This product completely changed my life!'. Return one label only."),
Literal["Positive", "Negative", "Neutral"], # We only allow one of these three words
max_new_tokens=8,
)
print("Sentiment:", sentiment)
# Example 2: Forcing an integer output
bp = model(build_chat("What's the boiling point of water in Celsius? Return integer only."), int, max_new_tokens=8)
print("Boiling point (int):", bp)
# Example 3: Forcing a boolean output
prime = model(build_chat("Is 29 a prime number? Return true or false only."), bool, max_new_tokens=6)
print("Is prime (bool):", prime)
Look at that! The sentiment variable will always be one of "Positive", "Negative", or "Neutral". The bp variable will be a proper integer, and prime will be a true boolean. Outlines guides the model's token generation process, making it impossible for it to generate anything that doesn't fit the specified type.
This is a huge step up from just hoping for the best.
Stop Repeating Yourself: The Magic of Prompt Templates
As you build more complex workflows, you'll find yourself writing similar prompts over and over. That's where prompt templating comes in. It helps keep your code clean and your prompts consistent.
Outlines has a simple but powerful templating system that feels a lot like Jinja. You can define a template with placeholders and then fill them in dynamically.
banner("3) Prompt templating (outlines.Template)")
# Define a reusable prompt template with a placeholder for {{ text }}
tmpl = outlines.Template.from_string(textwrap.dedent("""
<|system|>
You are a strict classifier. Return ONLY one label.
<|user|>
Classify sentiment of this text: {{ text }}
Labels: Positive, Negative, Neutral
<|assistant|>
""").strip())
# Use the template with new text
templated = model(
tmpl(text="The food was cold but the staff were kind."),
Literal["Positive","Negative","Neutral"], # We can still apply our type constraints!
max_new_tokens=8
)
print("Template sentiment:", templated)
See how clean that is? We separate the prompt's structure from the data we're feeding it. This makes your code way more maintainable, especially when your prompts get long and complicated.
The Main Event: Forcing Perfect, Complex JSON with Pydantic
Okay, this is where things get really powerful. We’ve handled simple types, but what about the complex, nested JSON objects we need for real-world applications?
This is where we bring in Pydantic. If you haven't used it, Pydantic is a library for data validation and settings management using Python type hints. You define the "shape" of your data as a class, and Pydantic handles the parsing, validation, and error handling.
When you combine Outlines with a Pydantic model, you're telling the LLM: "Generate a JSON object that looks exactly like this class. No exceptions."
Let's imagine we're building a system to automatically create support tickets from customer emails. We can define a very specific ServiceTicket schema.
banner("4) Pydantic structured output (advanced constraints)")
class TicketPriority(str, Enum):
low = "low"
medium = "medium"
high = "high"
urgent = "urgent"
# A Pydantic model defining the exact structure we want
class ServiceTicket(BaseModel):
priority: TicketPriority
category: Literal["billing", "login", "bug", "feature_request", "other"]
requires_manager: bool
summary: str = Field(min_length=10, max_length=220)
action_items: List[str] = Field(min_length=1, max_length=6)
# Here's a sample customer email
email = """
Subject: URGENT - Cannot access my account after payment
I paid for the premium plan 3 hours ago and still can't access any features.
I have a client presentation in an hour and need the analytics dashboard.
Please fix this immediately or refund my payment.
""".strip()
# Now, we ask the model to generate JSON that matches our ServiceTicket schema
ticket_text = model(
build_chat(
"Extract a ServiceTicket from this message.\n"
"Return JSON ONLY matching the ServiceTicket schema.\n"
"Action items must be distinct.\n\nMESSAGE:\n" + email
),
ServiceTicket, # We pass the Pydantic class directly!
max_new_tokens=240,
)
By passing the ServiceTicket class directly to the model, Outlines generates a JSON string that is guaranteed to conform to that Pydantic model. It's incredible. We've constrained the output to have a specific priority, a category from a list, a boolean flag, a summary of a certain length, and a list of action items.
But what if the model still messes up slightly, maybe with an extra comma? For extra safety, I've written a small safe_validate function that tries to parse the JSON and, if it fails, does a tiny bit of cleanup before trying again. It's a good practice for production systems.
def extract_json_object(s: str) -> str:
s = s.strip()
start = s.find("{")
if start == -1:
return s
depth = 0
in_str = False
esc = False
for i in range(start, len(s)):
ch = s[i]
if in_str:
if esc:
esc = False
elif ch == "\\":
esc = True
elif ch == '"':
in_str = False
else:
if ch == '"':
in_str = True
elif ch == "{":
depth += 1
elif ch == "}":
depth -= 1
if depth == 0:
return s[start:i + 1]
return s[start:]
def json_repair_minimal(bad: str) -> str:
bad = bad.strip()
last = bad.rfind("}")
if last != -1:
return bad[:last + 1]
return bad
def safe_validate(model_cls, raw_text: str):
"""A safety net to extract and validate the JSON."""
raw = extract_json_object(raw_text)
try:
return model_cls.model_validate_json(raw)
except Exception:
# If it fails, try a minimal repair and validate again
raw2 = json_repair_minimal(raw)
return model_cls.model_validate_json(raw2)
# Now let's validate the output and print it
ticket = safe_validate(ServiceTicket, ticket_text) if isinstance(ticket_text, str) else ticket_text
print("ServiceTicket JSON:\n", ticket.model_dump_json(indent=2))
The result is a perfectly structured, validated Python object that you can immediately use in your application. No more manual parsing or guesswork.
Making the LLM Do Real Work: A Smarter Way to Call Functions
This final pattern is where we bridge the gap between getting information from an LLM and getting an LLM to do things. This is the core idea behind "function calling" or "tool use."
Instead of asking the LLM to perform a task itself (like adding two numbers), we ask it to generate the arguments for a function that we've written. This is safer, more reliable, and lets you connect LLMs to any API or codebase.
Here’s a simple example. We have a Python function add(a, b). We'll use the LLM to generate valid a and b arguments.
banner("5) Function-calling style (schema -> args -> call)")
# 1. Define the schema for the function's arguments using Pydantic
class AddArgs(BaseModel):
a: int = Field(ge=-1000, le=1000)
b: int = Field(ge=-1000, le=1000)
# 2. This is our actual Python function
def add(a: int, b: int) -> int:
return a + b
# 3. Ask the LLM to generate the arguments based on a prompt
args_text = model(
build_chat("Return JSON ONLY with two integers a and b. Make a odd and b even."),
AddArgs, # Constrain the output to our argument schema
max_new_tokens=80,
)
# 4. Validate the generated arguments
args = safe_validate(AddArgs, args_text) if isinstance(args_text, str) else args_text
print("Args:", args.model_dump())
# 5. Safely execute our function with the validated arguments
print("add(a,b) =", add(args.a, args.b))
This is a really powerful pattern. We're using the LLM for what it's good at—understanding natural language and extracting intent—and using our own solid, reliable code for the actual computation. The Pydantic schema acts as the unbreakable contract between the LLM and our code.
We've gone from wrestling with messy strings to building structured, reliable, and function-driven AI systems. By forcing the model's output to conform to a schema at generation time, we eliminate an entire class of errors and build applications that are more predictable and robust.
This isn't just a neat trick; it's a fundamental shift in how we should think about building with LLMs. It’s about moving from probabilistic "magic" to deterministic engineering. So next time you find yourself writing yet another JSON parser, give Outlines and Pydantic a try. It might just change the way you build.




