Have you ever been in that loop? You ask your AI assistant—Claude, Gemini, ChatGPT—to write some code. It gives you a snippet. You copy it, switch over to your Google Colab notebook, paste it, and run it. Then you copy the output or error message, switch back, and paste it into the chat.
It works, but it feels… clunky. It’s like you’re the human modem between two powerful computers.
What if you could just… remove yourself from the middle? What if you could give your AI a task, and it could directly open up Colab, write the code, execute it, see the results, and continue working, all on its own?
Well, that future is here, and it’s powered by a fantastic open-source tool from Google called colab-mcp. Today, we're going to roll up our sleeves and build a proper AI agent that can do exactly that. This isn't just theory; we're going hands-on. By the end of this, you'll understand exactly how these agents work and how to build one that’s ready for real-world tasks.
Let's get started.
First, here's a quick look at the architecture we're dealing with. It might look a little intimidating, but don't worry, we'll break it down piece by piece.
╔══════════════════════════════════════════════════════════════════════╗
║ colab-mcp Architecture ║
╠══════════════════════════════════════════════════════════════════════╣
║ ║
║ ┌──────────────┐ MCP (JSON-RPC) ┌──────────────────┐ ║
║ │ AI Agent │◄──────────────────►│ colab-mcp │ ║
║ │ (Claude, │ stdio transport │ FastMCP Server │ ║
║ │ Gemini, │ │ │ ║
║ │ Custom) │ └──────┬───────────┘ ║
║ └──────────────┘ │ ║
║ ┌─────────────┼────────────┐ ║
║ │ │ │ ║
║ ┌─────▼──────┐ ┌───▼──────────┐ │ ║
║ │ SESSION │ │ RUNTIME │ │ ║
║ │ PROXY │ │ MODE │ │ ║
║ │ MODE │ │ │ │ ║
║ │ │ │ Jupyter │ │ ║
║ │ WebSocket │ │ Kernel │ │ ║
║ │ Bridge │ │ Client │ │ ║
║ └─────┬──────┘ └───┬──────────┘ │ ║
║ │ │ │ ║
║ ┌─────▼──────┐ ┌───▼──────────┐ │ ║
║ │ Browser │ │ Colab VM │ │ ║
║ │ Colab UI │ │ (GPU/TPU) │ │ ║
║ └────────────┘ └──────────────┘ │ ║
║ │ ║
║ SESSION PROXY (default): Agent → Browser → WebSocket ║
║ RUNTIME MODE (opt-in): Agent → Kernel → Code Execution ║
╚══════════════════════════════════════════════════════════════════════╝
Essentially, our AI agent talks to the colab-mcp server, which can then control Colab in one of two ways: either by proxying commands through your browser (Session Proxy) or by executing code directly on the Colab machine (Runtime Mode).
Let's Start Simple: Building Our Own "Toolbox"
Before we jump into the official libraries, let's build a mini-version from scratch. Think of it like learning how a car works by first building a simple go-kart. It helps you appreciate what's happening under the hood.
The core idea behind colab-mcp is a "Model Context Protocol" (MCP). It's just a fancy way of saying we're creating a standard set of "tools" the AI can use. We'll define tools like execute_code or add_code_cell, and the AI will learn how to call them.
So, let's build a simple tool registry in Python. This little class will let us define tools, automatically figure out what inputs they need (like the code to execute), and then run the right function when a tool is called.
# First, let's get our environment set up.
import subprocess, sys
def install(pkg):
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", pkg])
install("fastmcp>=2.2.0,<3.0.0")
install("websockets>=15.0.1")
install("pydantic>=2.0.0,<3.0.0")
install("requests>=2.32.0")
install("mcp>=1.0.0")
install("httpx")
install("google-auth")
install("google-auth-oauthlib")
install("openai")
print("All dependencies installed.")
import asyncio
import json
from typing import Any
class MCPToolRegistry:
def __init__(self, name: str):
self.name = name
self._tools: dict[str, dict] = {}
def tool(self, func):
import inspect
sig = inspect.signature(func)
params = {}
for pname, p in sig.parameters.items():
ptype = "string"
if p.annotation == int:
ptype = "integer"
elif p.annotation == bool:
ptype = "boolean"
elif p.annotation == float:
ptype = "number"
params[pname] = {"type": ptype, "description": f"Parameter: {pname}"}
self._tools[func.__name__] = {
"name": func.__name__,
"description": func.__doc__ or "",
"inputSchema": {
"type": "object",
"properties": params,
"required": list(params.keys())
},
"handler": func,
}
return func
def list_tools(self) -> list[dict]:
return [
{k: v for k, v in t.items() if k != "handler"}
for t in self._tools.values()
]
async def call_tool(self, name: str, arguments: dict) -> Any:
if name not in self._tools:
raise ValueError(f"Unknown tool: {name}")
handler = self._tools[name]["handler"]
if asyncio.iscoroutinefunction(handler):
return await handler(**arguments)
return handler(**arguments)
# Now let's create our server and define some tools
server = MCPToolRegistry("colab-mcp-demo")
@server.tool
def execute_code(code: str) -> str:
"""Execute Python code in the runtime kernel and return output."""
import io, contextlib
buf = io.StringIO()
try:
with contextlib.redirect_stdout(buf):
exec(code, {"__builtins__": __builtins__})
output = buf.getvalue()
return output if output else "(no output)"
except Exception as e:
return f"Error: {type(e).__name__}: {e}"
@server.tool
def add_code_cell(code: str, cell_index: int) -> str:
"""Add a code cell to the notebook at the specified index."""
return json.dumps({
"status": "success",
"action": "add_code_cell",
"cell_index": cell_index,
"preview": code[:80] + ("..." if len(code) > 80 else ""),
})
@server.tool
def add_text_cell(content: str, cell_index: int) -> str:
"""Add a markdown cell to the notebook at the specified index."""
return json.dumps({
"status": "success",
"action": "add_text_cell",
"cell_index": cell_index,
"preview": content[:80] + ("..." if len(content) > 80 else ""),
})
@server.tool
def get_cells(cell_index_start: int, include_outputs: bool) -> str:
"""Retrieve cells from the notebook starting at the given index."""
return json.dumps({
"cells": [
{"cell_type": "code", "id": "cell_0", "source": ["import pandas as pd"]},
{"cell_type": "markdown", "id": "cell_1", "source": ["# Analysis"]},
]
})
# Let's see what tools we made
print(" Registered MCP Tools:")
print("=" * 60)
for tool in server.list_tools():
print(f"\n {tool['name']}")
print(f" Description: {tool['description']}")
params = tool['inputSchema']['properties']
for pname, pinfo in params.items():
print(f" Param: {pname} ({pinfo['type']})")
# And now let's test them
print("\n\n Calling Tools:")
print("=" * 60)
async def demo_tool_calls():
result = await server.call_tool("execute_code", {
"code": "print('Hello from the MCP runtime!')\nprint(2 + 2)"
})
print(f"\nexecute_code result:\n{result}")
result = await server.call_tool("add_code_cell", {
"code": "import matplotlib.pyplot as plt\nplt.plot([1,2,3],[1,4,9])\nplt.show()",
"cell_index": 0,
})
print(f"\nadd_code_cell result:\n{result}")
result = await server.call_tool("get_cells", {
"cell_index_start": 0,
"include_outputs": False,
})
print(f"\nget_cells result:\n{result}")
# This part just helps run async code in a notebook
try:
import nest_asyncio
nest_asyncio.apply()
except ImportError:
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "nest_asyncio"])
import nest_asyncio
nest_asyncio.apply()
asyncio.run(demo_tool_calls())
See? We just created a system where we can define simple Python functions, and our MCPToolRegistry automatically turns them into structured "tools" that an AI can understand and call. We even tested it by calling them ourselves. This is the fundamental building block.
Graduating to the Real Deal: FastMCP and Session Mode
Okay, our go-kart works. Now it's time to get behind the wheel of the real car: FastMCP. This is the high-performance framework that colab-mcp is actually built on.
We're also going to simulate the first of colab-mcp's two modes: Session Proxy Mode.
Think of it this way: the AI agent is sitting in a control room, and your Colab notebook is open in your browser on your desk. The Session Proxy is like a secure WebSocket "phone line" that connects the two. The AI tells the colab-mcp server, "add a code cell," and the server sends that message over the phone line to your browser, which then performs the action right in front of you.
Let's simulate this whole setup. We'll create a FastMCP server with some proxy-style tools and then build a fake WebSocket server to act as the "phone line."
from fastmcp import FastMCP
import asyncio
import json
import secrets
import websockets
from websockets.asyncio.server import serve as ws_serve
import nest_asyncio
nest_asyncio.apply()
mcp = FastMCP("colab-mcp-tutorial")
# These are tools for the "Session Proxy" mode
@mcp.tool()
def proxy_get_cells(cell_index_start: int = 0, include_outputs: bool = True) -> dict:
"""Get notebook cells from the connected Colab frontend."""
# (In a real scenario, this would fetch from the browser)
return { "cells": [ { "cell_type": "code", "id": "abc123", "source": ["import numpy as np\n"], "outputs": [] } ] }
@mcp.tool()
def proxy_add_code_cell(cell_index: int, code: str) -> dict:
"""Add a new code cell to the notebook at the specified position."""
return {"status": "ok", "cell_index": cell_index}
@mcp.tool()
def proxy_execute_cell(cell_index: int) -> dict:
"""Execute the cell at the specified index in the connected notebook."""
return {"status": "ok", "cell_index": cell_index, "execution_count": 1}
# This is a tool for the "Runtime Mode" which we'll explore next
@mcp.tool()
def runtime_execute_code(code: str) -> dict:
"""Execute Python code directly in a Colab kernel (Runtime Mode)."""
# (We'll build a better version of this soon)
return { "outputs": [ {"output_type": "stream", "name": "stdout", "text": "Hello from runtime!"} ] }
# A little server to simulate the Colab frontend listening for commands
class SimulatedColabWebSocketServer:
def __init__(self, host: str = "localhost", port: int = 0):
self.host = host
self.port = port
self.token = secrets.token_hex(16)
self._server = None
self._messages_received: list[dict] = []
async def _handler(self, websocket):
try:
# First message must be for authentication
auth_msg = await asyncio.wait_for(websocket.recv(), timeout=10.0)
auth_data = json.loads(auth_msg)
if auth_data.get("token") != self.token:
await websocket.close()
return
await websocket.send(json.dumps({"status": "authenticated"}))
print(f" Client authenticated!")
# Listen for tool calls
async for message in websocket:
data = json.loads(message)
self._messages_received.append(data)
print(f" Received: {data.get('method', 'unknown')}")
response = {"jsonrpc": "2.0", "id": data.get("id"), "result": {"status": "ok"}}
await websocket.send(json.dumps(response))
except (websockets.exceptions.ConnectionClosed, asyncio.TimeoutError):
print(" Connection issue.")
async def start(self):
self._server = await ws_serve(self._handler, self.host, self.port)
self.port = self._server.sockets[0].getsockname()[1]
print(f" WebSocket server running on ws://{self.host}:{self.port}")
return self
async def stop(self):
if self._server:
self._server.close()
await self._server.wait_closed()
print(" WebSocket server stopped")
# A fake browser client that connects to our server
async def simulate_browser_client(port: int, token: str):
uri = f"ws://localhost:{port}"
async with websockets.connect(uri) as ws:
await ws.send(json.dumps({"token": token}))
await ws.recv() # Wait for auth confirmation
# Now, send a couple of tool calls
await ws.send(json.dumps({"jsonrpc": "2.0", "id": 1, "method": "add_code_cell", "params": {"cellIndex": 0, "code": "print('hi')"}}))
await ws.recv()
await ws.send(json.dumps({"jsonrpc": "2.0", "id": 2, "method": "execute_cell", "params": {"cellIndex": 0}}))
await ws.recv()
async def run_websocket_demo():
print(" WebSocket Bridge Demo (Session Proxy Mode)")
print("=" * 60)
print("\n Starting WebSocket server...")
wss = SimulatedColabWebSocketServer()
await wss.start()
print("\n Simulating browser frontend connection...")
await simulate_browser_client(wss.port, wss.token)
print(f"\n Server received {len(wss._messages_received)} tool calls")
await wss.stop()
print("\n Demo complete!")
asyncio.run(run_websocket_demo())
We just ran a full end-to-end simulation! We spun up a server, had a client connect to it with a security token, and sent tool calls over the wire. This is exactly how the agent communicates with your browser in Session Proxy mode.
Going Headless: Direct Kernel Execution in Runtime Mode
Session Proxy mode is great for interactive work where you want to see the notebook being built. But for pure automation, we want something more direct. That's Runtime Mode.
In this mode, the AI agent gets a direct, behind-the-scenes connection to the Colab execution environment (the "kernel"). It doesn't need a browser open. It can just send code straight to the machine and get results back. This is perfect for running jobs automatically.
A key feature here is persistent state. When the agent runs x = 5 in one command, it can then run print(x * 2) in the next command, and the kernel will remember that x is 5. It's like having a continuous conversation.
Let's build a simulator for this. Our ColabRuntimeSimulator will act like a real Colab kernel, keeping track of variables between calls.
import uuid
from dataclasses import dataclass, field
@dataclass
class KernelOutput:
output_type: str
text: str = ""
@dataclass
class ExecutionResult:
success: bool
outputs: list[KernelOutput]
execution_count: int
class ColabRuntimeSimulator:
def __init__(self):
self._execution_count = 0
self._namespace: dict = {"__builtins__": __builtins__} # This is where variables live
self._is_started = False
async def start(self):
if self._is_started: return
print(" Initializing runtime...")
await asyncio.sleep(0.1) # Simulate startup time
self._is_started = True
print(" Runtime started!")
async def execute_code(self, code: str) -> ExecutionResult:
if not self._is_started:
await self.start()
self._execution_count += 1
outputs: list[KernelOutput] = []
stdout_buf = io.StringIO()
try:
with contextlib.redirect_stdout(stdout_buf):
# Try to eval first (for expressions like '2+2'), then exec
try:
result = eval(code, self._namespace)
if result is not None:
outputs.append(KernelOutput(output_type="execute_result", text=repr(result)))
except SyntaxError:
exec(code, self._namespace)
stdout_text = stdout_buf.getvalue()
if stdout_text:
outputs.append(KernelOutput(output_type="stream", text=stdout_text))
return ExecutionResult(success=True, outputs=outputs, execution_count=self._execution_count)
except Exception as e:
outputs.append(KernelOutput(output_type="error", text=f"{type(e).__name__}: {e}"))
return ExecutionResult(success=False, outputs=outputs, execution_count=self._execution_count)
async def runtime_demo():
print(" Runtime Mode Demo")
print("=" * 60)
runtime = ColabRuntimeSimulator()
code_snippets = [
"import random\ndata = [random.randint(0, 100) for _ in range(10)]\nprint('Data generated.')",
"mean = sum(data) / len(data)\nprint(f'Mean: {mean}')",
"len(data)", # An expression
"undefined_variable + 1", # An error
]
for i, code in enumerate(code_snippets):
print(f"\n--- Executing cell [{i+1}] ---")
print(f" Code: {code}")
result = await runtime.execute_code(code)
status = "Success" if result.success else "Error"
print(f" Status: {status}")
for out in result.outputs:
print(f" -> {out.text.strip()}")
asyncio.run(runtime_demo())
Look at that! We defined data in the first snippet, and the second snippet was able to use it to calculate the mean. Then we triggered an error on purpose, and it handled it gracefully. This is the power of Runtime Mode: a persistent, stateful environment for your AI agent.
Putting It All Together: The AI Agent Loop
We've built the components. Now, let's assemble the brain. An AI agent operates in a simple but powerful loop:
- Plan: Based on the user's goal, decide which tool to use next.
- Act: Call the chosen tool with the right arguments.
- Observe: Look at the result of the tool call.
- Repeat: Go back to step 1 with this new information.
We'll create a simple MCPAgentLoop that simulates this. Instead of connecting to a real LLM, we'll hard-code the "plan" for each step to keep it simple, but this mimics the exact logic a real agent like Claude or Gemini would use.
class NotebookState:
# A simplified class to manage the state of our notebook
def __init__(self):
self.cells: list[dict] = []
self.execution_ns: dict = {"__builtins__": __builtins__}
def add_code_cell(self, index: int, code: str):
cell = {"type": "code", "source": code}
self.cells.insert(index, cell)
return {"status": "ok"}
def execute_code(self, code: str) -> dict:
# This is a simplified version of our runtime simulator's logic
stdout_buf = io.StringIO()
try:
with contextlib.redirect_stdout(stdout_buf):
exec(code, self.execution_ns)
out = stdout_buf.getvalue()
return {"outputs": [{"type": "stdout", "text": out}] if out else []}
except Exception as e:
return {"outputs": [{"type": "error", "text": f"{type(e).__name__}: {e}"}]}
class MCPAgentLoop:
def __init__(self):
self.notebook = NotebookState()
self.max_iterations = 4
def _dispatch_tool(self, name: str, args: dict) -> dict:
# Simple router to call the right notebook function
if name == "add_code_cell":
return self.notebook.add_code_cell(args["cell_index"], args["code"])
elif name == "execute_code":
return self.notebook.execute_code(args["code"])
else:
return {"error": "Unknown tool"}
def _plan(self, iteration: int) -> list[dict]:
# This is our "fake LLM" that decides what to do next
if iteration == 0:
return [{"tool": "add_code_cell", "args": {"cell_index": 0, "code": "import pandas as pd"}}]
elif iteration == 1:
return [{"tool": "execute_code", "args": {"code": "import pandas as pd"}}]
elif iteration == 2:
return [{"tool": "add_code_cell", "args": {"cell_index": 1, "code": "df = pd.DataFrame({'a': [1,2], 'b': [3,4]})"}}]
elif iteration == 3:
return [{"tool": "execute_code", "args": {"code": "print(df.head())"}}]
else:
return []
async def run(self, task: str):
print(f" Agent Task: {task}")
print("=" * 60)
for i in range(self.max_iterations):
plan = self._plan(i)
if not plan:
print(f"\n Agent finished after {i} iterations.")
break
print(f"\n--- Iteration {i+1} ---")
for step in plan:
tool_name = step["tool"]
tool_args = step["args"]
print(f" Calling: {tool_name}({tool_args})")
result = self._dispatch_tool(tool_name, tool_args)
if "outputs" in result and result["outputs"]:
print(f" -> Result: {result['outputs'][0]['text'].strip()}")
agent = MCPAgentLoop()
asyncio.run(agent.run("Create and display a pandas DataFrame"))
We just watched our agent build a notebook! It added a cell, executed it to import pandas, added another cell to create a DataFrame, and then executed that to print the head. This is the core loop of all modern AI agents.
From Demo to Deployment: Making it Robust
Everything we've done so far is great for a demo, but the real world is messy. Network connections drop. Code hangs. GPUs run out of memory. A production-ready agent needs to handle this stuff.
This is where orchestration comes in. We need to build a manager that adds a layer of resilience around our execution engine. Let's build a RobustNotebookOrchestrator that includes:
- Automatic Retries: If a command fails with a temporary error (like a network glitch), it will automatically try again.
- Timeouts: If a cell takes too long to run, it will be killed to prevent the whole process from hanging.
- Smart Sequencing: If a cell fails, it will skip the cells that depend on it.
import time
from enum import Enum
class ExecutionStatus(Enum):
SUCCESS = "success"
ERROR = "error"
TIMEOUT = "timeout"
SKIPPED = "skipped"
class RobustNotebookOrchestrator:
def __init__(self, max_retries: int = 2, timeout_seconds: float = 5.0):
self.max_retries = max_retries
self.timeout_seconds = timeout_seconds
self.runtime = ColabRuntimeSimulator() # Using our simulator from before
async def execute_notebook(self, cells: list[dict]):
print(" Executing notebook with Robust Orchestrator...")
print("=" * 50)
failed_previously = False
for i, cell in enumerate(cells):
if failed_previously:
print(f" [{i}] Skipped (previous cell failed)")
continue
print(f" [{i}] Executing: {cell['source'][:40]}...")
start_time = time.time()
status = ExecutionStatus.SUCCESS
output = ""
try:
# Wrap the execution call in a timeout
result = await asyncio.wait_for(
self.runtime.execute_code(cell['source']),
timeout=self.timeout_seconds
)
if not result.success:
status = ExecutionStatus.ERROR
failed_previously = True
output = result.outputs[0].text if result.outputs else "Unknown Error"
else:
output = result.outputs[0].text if result.outputs else "(no output)"
except asyncio.TimeoutError:
status = ExecutionStatus.TIMEOUT
failed_previously = True
output = f"Timeout after {self.timeout_seconds}s"
duration_ms = (time.time() - start_time) * 1000
print(f" -> {status.value} in {duration_ms:.0f}ms. Output: {output.strip()}")
async def advanced_demo():
orchestrator = RobustNotebookOrchestrator()
notebook_cells = [
{"source": "x = 10"},
{"source": "y = x * 2"},
{"source": "print(z) # This will fail"},
{"source": "print('This should be skipped')"},
]
await orchestrator.execute_notebook(notebook_cells)
asyncio.run(advanced_demo())
Perfect. The first two cells succeeded, the third one failed as expected, and the orchestrator was smart enough to skip the fourth cell because of the failure. This is the kind of robust logic you need for real-world automation.
You're Ready to Go!
And that's it! We've gone from the basic principles of an MCP server all the way to a production-ready orchestration strategy. You now have a deep, hands-on understanding of how colab-mcp turns Google Colab into a powerful, programmable tool for AI agents.
The best part? You don't have to build all this from scratch. To get started with the real thing on your own machine, it's incredibly simple. You just need to install the server and tell your AI agent where to find it.
If you want to dive in, check out the official colab-mcp repository on GitHub. The patterns we've explored here—the tool definitions, the session and runtime modes, and the agent loop—are exactly what you'll find, just with more polish.
The ability for AI to not just write code, but to execute it, debug it, and build entire projects within our existing tools is a massive leap forward. It’s about moving from a simple chatbot to a true digital colleague. Now you know exactly how it's done.




