Building Scalable AI Agents: A Guide to Stateless, Secure Protocols

Akram Chauhan
Akram Chauhan
10 min read145 views
Building Scalable AI Agents: A Guide to Stateless, Secure Protocols

So, you're building AI agents. Awesome. It’s an exciting space, but if you’ve tried to get multiple agents or services to talk to each other, you’ve probably hit a wall. It gets messy, fast. How do you make sure they understand each other? How do you keep the conversation secure? And what happens when one agent needs to go off and think for a while (run a long task) without holding up the entire system?

It’s tempting to build complex systems with persistent connections and session states, but that often leads to a tangled web that’s brittle and hard to scale. Every time a client connects, the server has to remember who they are and what they were doing. It's like a waiter trying to remember every single customer's order for their entire multi-hour stay without writing anything down. It just doesn't work once the restaurant gets busy.

There’s a better way. We can build a communication protocol that’s clean, simple, and incredibly powerful. The whole idea is based on three key principles:

  1. Stateless Communication: Every message contains all the information needed to process it. No session history, no server memory. The server has the memory of a goldfish, and that’s a feature, not a bug.
  2. Ironclad Security: Every message is cryptographically signed. We can be 100% sure who sent the message and that it hasn't been tampered with.
  3. Asynchronous Operations: For tasks that take time, the server can say, "Got it, I'm on it. Here’s a ticket number. Check back with me later." This frees up everyone to do other things.

Let's walk through how to actually build a protocol like this. It’s surprisingly straightforward, and it will completely change how you think about agent architecture.

Getting Our Tools Ready: The Foundation

Before we build the house, we need to lay the foundation. In our case, this means setting up a few small, essential helper functions that we'll use everywhere.

Think of this as prepping your kitchen before you start cooking. We need a way to get the current time, generate unique IDs, and, most importantly, create a secure way to sign our messages.

import asyncio, time, json, uuid, hmac, hashlib
from dataclasses import dataclass
from typing import Any, Dict, Optional, Literal, List
from pydantic import BaseModel, Field, ValidationError, ConfigDict

def _now_ms():
    return int(time.time() * 1000)

def _uuid():
    return str(uuid.uuid4())

def _canonical_json(obj):
    return json.dumps(obj, separators=(",", ":"), sort_keys=True).encode()

def _hmac_hex(secret, payload):
    return hmac.new(secret, _canonical_json(payload), hashlib.sha256).hexdigest()

Here’s what’s going on:

  • _now_ms and _uuid are simple helpers for timestamps and unique IDs. Standard stuff.
  • _canonical_json is the secret sauce. It takes a Python dictionary and turns it into a standardized JSON string. By sorting the keys and removing whitespace, we guarantee that the same dictionary will always produce the exact same string. This is crucial for our next step.
  • _hmac_hex is our security guard. It takes a secret key and a payload (our canonical JSON) and creates a unique, verifiable signature (an HMAC). If even one character in the payload changes, the signature will be completely different. This is how we ensure messages aren't fakes or have been messed with in transit.

With these utilities in our back pocket, we can start building the core communication structure.

The Universal Language: Envelopes and Responses

For two systems to talk, they need a shared language. We're going to define a very strict format for every single message. Think of it like a standardized postal envelope. It doesn't matter what's inside the letter; the envelope always has a spot for the recipient's address, the sender's address, and a stamp.

We'll use Pydantic to enforce this structure. Pydantic is like a super-strict postmaster who will immediately reject any mail that isn't formatted perfectly. This is amazing because it prevents malformed or malicious data from ever reaching our application logic.

First, the request envelope:

class MCPEnvelope(BaseModel):
    model_config = ConfigDict(extra="forbid")
    v: Literal["mcp/0.1"] = "mcp/0.1"
    request_id: str = Field(default_factory=_uuid)
    ts_ms: int = Field(default_factory=_now_ms)
    client_id: str
    server_id: str
    tool: str
    args: Dict[str, Any] = Field(default_factory=dict)
    nonce: str = Field(default_factory=_uuid)
    signature: str

And here’s the response format:

class MCPResponse(BaseModel):
    model_config = ConfigDict(extra="forbid")
    v: Literal["mcp/0.1"] = "mcp/0.1"
    request_id: str
    ts_ms: int = Field(default_factory=_now_ms)
    ok: bool
    server_id: str
    status: Literal["ok", "accepted", "running", "done", "error"]
    result: Optional[Dict[str, Any]] = None
    error: Optional[str] = None
    signature: str

See how clear that is? Every request (MCPEnvelope) must specify who it's from (client_id), who it's for (server_id), what it wants to do (tool), and the arguments for that tool (args). And, of course, it has that all-important signature. The model_config = ConfigDict(extra="forbid") line is our Pydantic magic—it tells the model to reject any requests that include extra, unexpected fields.

The MCPResponse is just as clear. It links back to the original request_id, gives a clear status (like "ok" for instant success, "accepted" for a long-running job, or "error"), and contains either a result or an error message.

This strict structure is the bedrock of our reliable system. No more guessing what the other side is trying to say.

Making Clear Promises: Defining Our Tools

Now that we have our envelope, what can we actually put inside it? We need to define the specific "tools" our server offers. Again, we'll use Pydantic to create crystal-clear contracts for each tool's inputs and outputs.

This is like creating a user manual for our API. It makes the behavior predictable and safe, which is especially important when an LLM-driven agent might be the one calling the tools.

Let’s define a few example tools:

class ServerIdentityOut(BaseModel):
    model_config = ConfigDict(extra="forbid")
    server_id: str
    fingerprint: str
    capabilities: Dict[str, Any]

class BatchSumIn(BaseModel):
    model_config = ConfigDict(extra="forbid")
    numbers: List[float] = Field(min_length=1)

class BatchSumOut(BaseModel):
    model_config = ConfigDict(extra="forbid")
    count: int
    total: float

class StartLongTaskIn(BaseModel):
    model_config = ConfigDict(extra="forbid")
    seconds: int = Field(ge=1, le=20)
    payload: Dict[str, Any] = Field(default_factory=dict)

class PollJobIn(BaseModel):
    model_config = ConfigDict(extra="forbid")
    job_id: str
  • ServerIdentityOut: A simple tool to ask the server "who are you?"
  • BatchSumIn/Out: A synchronous tool. You give it a list of numbers, and it immediately gives you the count and total. Notice the Field(min_length=1)—Pydantic won't even let an empty list through!
  • StartLongTaskIn: This is for our asynchronous job. It takes a number of seconds to run.
  • PollJobIn: This is the other half of the async puzzle. We use it to check on the status of a long-running job using its job_id.

By defining these models, we've made our server's capabilities completely transparent and safe.

The Brains of the Operation: The Stateless Server

Alright, let's build the server itself. This is where all our pieces come together. Our server will be stateless, meaning it won't hold onto any client-specific information between requests. The only "state" it will manage is the list of long-running jobs it's currently working on.

@dataclass
class JobState:
    job_id: str
    status: str
    result: Optional[Dict[str, Any]] = None
    error: Optional[str] = None

class MCPServer:
    def __init__(self, server_id, secret):
        self.server_id = server_id
        self.secret = secret
        self.jobs = {} # Stores the state of long-running jobs
        self.tasks = {} # Stores the running asyncio tasks

    def _fingerprint(self):
        return hashlib.sha256(self.secret).hexdigest()[:16]

    async def handle(self, env_dict, client_secret):
        # 1. Validate the envelope shape
        try:
            env = MCPEnvelope(**env_dict)
        except ValidationError as e:
            return {"error": "bad envelope", "details": str(e)}

        # 2. Verify the signature
        payload = env.model_dump()
        sig = payload.pop("signature")
        if _hmac_hex(client_secret, payload) != sig:
            return {"error": "bad signature"}

        # 3. Dispatch to the correct tool
        if env.tool == "server_identity":
            # ... (implementation for server_identity)
            ...
        elif env.tool == "batch_sum":
            # ... (implementation for batch_sum)
            ...
        elif env.tool == "start_long_task":
            args = StartLongTaskIn(**env.args)
            jid = _uuid()
            self.jobs[jid] = JobState(jid, "running")

            async def run():
                await asyncio.sleep(args.seconds)
                self.jobs[jid].status = "done"
                self.jobs[jid].result = args.payload
            
            self.tasks[jid] = asyncio.create_task(run())

            resp = MCPResponse(
                request_id=env.request_id,
                ok=True,
                server_id=self.server_id,
                status="accepted",
                result={"job_id": jid},
                signature="",
            )
        elif env.tool == "poll_job":
            # ... (implementation for poll_job)
            ...
        
        # 4. Sign and return the response
        payload = resp.model_dump()
        # We need to remove the signature field before signing
        payload.pop("signature", None)
        resp.signature = _hmac_hex(self.secret, payload)
        return resp.model_dump()

The handle method is the heart of the server. Look at the flow:

  1. Validate: It first tries to load the incoming request into our MCPEnvelope model. If it fails, Pydantic throws an error, and we reject it immediately.
  2. Verify: It then checks the signature using the client_secret. If it doesn't match, the request is a forgery, and we reject it.
  3. Dispatch: Only after the request is validated and verified do we look at the env.tool and run the correct logic.
  4. Sign & Return: Finally, we build the MCPResponse, sign it with our server's secret, and send it back.

The start_long_task logic is particularly cool. It kicks off the task in the background (asyncio.create_task), stores its state in the self.jobs dictionary, and immediately returns a response with a job_id. The client isn't blocked waiting for the task to finish.

Putting It All Together: The Client and a Demo

A server is no good without a client. Let's build a simple client to interact with our server. The client's job is to correctly format the MCPEnvelope, sign it, and send it off.

class MCPClient:
    def __init__(self, client_id, secret, server):
        self.client_id = client_id
        self.secret = secret
        self.server = server

    async def call(self, tool, args=None):
        env_data = {
            "client_id": self.client_id,
            "server_id": self.server.server_id,
            "tool": tool,
            "args": args or {},
        }
        
        # We create a temporary envelope to get defaults, then sign it
        temp_env = MCPEnvelope(**env_data, signature="")
        payload = temp_env.model_dump()
        payload.pop("signature")
        
        env_data["signature"] = _hmac_hex(self.secret, payload)
        
        # Re-create the full envelope with the signature
        final_env = MCPEnvelope(**env_data)
        
        return await self.server.handle(final_env.model_dump(), self.secret)

Now for the fun part. Let's see it in action!

async def demo():
    server_secret = b"server_secret_key_123"
    client_secret = b"client_secret_key_456"

    server = MCPServer("mcp-server-001", server_secret)
    client = MCPClient("client-001", client_secret, server)

    # 1. Simple synchronous call
    print("Checking server identity...")
    identity = await client.call("server_identity")
    print(identity)

    # 2. Another synchronous call with args
    print("\nCalculating a sum...")
    sum_result = await client.call("batch_sum", {"numbers": [10, 20, 30]})
    print(sum_result)
    
    # 3. Start a long-running task
    print("\nStarting a 2-second task...")
    start_resp = await client.call("start_long_task", {"seconds": 2, "payload": {"data": "complete"}})
    job_id = start_resp["result"]["job_id"]
    print(f"Task started with job_id: {job_id}")

    # 4. Poll for the result
    while True:
        print("Polling job status...")
        poll_resp = await client.call("poll_job", {"job_id": job_id})
        if poll_resp["status"] == "done":
            print("\nTask is done!")
            print(poll_resp)
            break
        await asyncio.sleep(0.5)

# To run this in a Python file:
# asyncio.run(demo())

This demo perfectly illustrates the power of our protocol. We make a couple of quick, synchronous calls. Then we fire off a long-running task. We immediately get a job_id back and can enter a polling loop, checking every half-second for the result without blocking anything.

This is exactly how you build scalable, non-blocking workflows for AI agents. One agent can ask another to perform a complex, multi-minute analysis. It doesn't have to sit there and wait. It gets a job ID, goes off and does a dozen other things, and then checks back in on the result later.

By focusing on stateless, signed messages and providing a mechanism for asynchronous tasks, we’ve created a protocol that is simple, transparent, secure, and ready for the real world. This isn't just a theoretical exercise; it's a practical pattern you can use to build robust, enterprise-grade agent systems that are easy to reason about and a joy to extend.

Tags

AI Security AI Scaling Software Development Enterprise AI AI Infrastructure AI agents Scalable AI Autonomous Agents AI workflows Data Security AI Orchestration cybersecurity System Design Multi-Agent Systems communication protocol stateless architecture asynchronous systems protocol design

Stay Updated

Get the latest articles and insights delivered straight to your inbox.

We respect your privacy. Unsubscribe at any time.

Aicosoft

AI & Technology News, Insights & Innovation

AICOSOFT delivers cutting-edge AI news, technology breakthroughs, and innovation insights. Stay informed about artificial intelligence, machine learning, robotics, and the latest tech trends shaping tomorrow.

Connect With Us

© 2026 Aicosoft. All rights reserved.