How to Build an AI Agent That Can Control Your Google Colab Notebooks

Akram Chauhan
Akram Chauhan
18 min read86 views
How to Build an AI Agent That Can Control Your Google Colab Notebooks

Have you ever been in that loop? You ask your AI assistant—Claude, Gemini, ChatGPT—to write some code. It gives you a snippet. You copy it, switch over to your Google Colab notebook, paste it, and run it. Then you copy the output or error message, switch back, and paste it into the chat.

It works, but it feels… clunky. It’s like you’re the human modem between two powerful computers.

What if you could just… remove yourself from the middle? What if you could give your AI a task, and it could directly open up Colab, write the code, execute it, see the results, and continue working, all on its own?

Well, that future is here, and it’s powered by a fantastic open-source tool from Google called colab-mcp. Today, we're going to roll up our sleeves and build a proper AI agent that can do exactly that. This isn't just theory; we're going hands-on. By the end of this, you'll understand exactly how these agents work and how to build one that’s ready for real-world tasks.

Let's get started.

First, here's a quick look at the architecture we're dealing with. It might look a little intimidating, but don't worry, we'll break it down piece by piece.

╔══════════════════════════════════════════════════════════════════════╗
║ colab-mcp Architecture                                               ║
╠══════════════════════════════════════════════════════════════════════╣
║                                                                      ║
║ ┌──────────────┐   MCP (JSON-RPC)   ┌──────────────────┐               ║
║ │   AI Agent   │◄──────────────────►│    colab-mcp     │               ║
║ │ (Claude,     │  stdio transport   │  FastMCP Server  │               ║
║ │  Gemini,     │                    │                  │               ║
║ │  Custom)     │                    └──────┬───────────┘               ║
║ └──────────────┘                          │                            ║
║                        ┌─────────────┼────────────┐                    ║
║                        │             │            │                    ║
║                  ┌─────▼──────┐  ┌───▼──────────┐ │                    ║
║                  │   SESSION  │  │    RUNTIME   │ │                    ║
║                  │    PROXY   │  │      MODE    │ │                    ║
║                  │     MODE   │  │              │ │                    ║
║                  │            │  │   Jupyter    │ │                    ║
║                  │ WebSocket  │  │   Kernel     │ │                    ║
║                  │   Bridge   │  │   Client     │ │                    ║
║                  └─────┬──────┘  └───┬──────────┘ │                    ║
║                        │             │            │                    ║
║                  ┌─────▼──────┐  ┌───▼──────────┐ │                    ║
║                  │   Browser  │  │    Colab VM  │ │                    ║
║                  │  Colab UI  │  │   (GPU/TPU)  │ │                    ║
║                  └────────────┘  └──────────────┘ │                    ║
║                                                   │                    ║
║ SESSION PROXY (default): Agent → Browser → WebSocket                 ║
║ RUNTIME MODE (opt-in): Agent → Kernel → Code Execution               ║
╚══════════════════════════════════════════════════════════════════════╝

Essentially, our AI agent talks to the colab-mcp server, which can then control Colab in one of two ways: either by proxying commands through your browser (Session Proxy) or by executing code directly on the Colab machine (Runtime Mode).

Let's Start Simple: Building Our Own "Toolbox"

Before we jump into the official libraries, let's build a mini-version from scratch. Think of it like learning how a car works by first building a simple go-kart. It helps you appreciate what's happening under the hood.

The core idea behind colab-mcp is a "Model Context Protocol" (MCP). It's just a fancy way of saying we're creating a standard set of "tools" the AI can use. We'll define tools like execute_code or add_code_cell, and the AI will learn how to call them.

So, let's build a simple tool registry in Python. This little class will let us define tools, automatically figure out what inputs they need (like the code to execute), and then run the right function when a tool is called.

# First, let's get our environment set up.
import subprocess, sys
def install(pkg):
    subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", pkg])

install("fastmcp>=2.2.0,<3.0.0")
install("websockets>=15.0.1")
install("pydantic>=2.0.0,<3.0.0")
install("requests>=2.32.0")
install("mcp>=1.0.0")
install("httpx")
install("google-auth")
install("google-auth-oauthlib")
install("openai")
print("All dependencies installed.")

import asyncio
import json
from typing import Any

class MCPToolRegistry:
    def __init__(self, name: str):
        self.name = name
        self._tools: dict[str, dict] = {}

    def tool(self, func):
        import inspect
        sig = inspect.signature(func)
        params = {}
        for pname, p in sig.parameters.items():
            ptype = "string"
            if p.annotation == int:
                ptype = "integer"
            elif p.annotation == bool:
                ptype = "boolean"
            elif p.annotation == float:
                ptype = "number"
            params[pname] = {"type": ptype, "description": f"Parameter: {pname}"}

        self._tools[func.__name__] = {
            "name": func.__name__,
            "description": func.__doc__ or "",
            "inputSchema": {
                "type": "object",
                "properties": params,
                "required": list(params.keys())
            },
            "handler": func,
        }
        return func

    def list_tools(self) -> list[dict]:
        return [
            {k: v for k, v in t.items() if k != "handler"}
            for t in self._tools.values()
        ]

    async def call_tool(self, name: str, arguments: dict) -> Any:
        if name not in self._tools:
            raise ValueError(f"Unknown tool: {name}")
        handler = self._tools[name]["handler"]
        if asyncio.iscoroutinefunction(handler):
            return await handler(**arguments)
        return handler(**arguments)

# Now let's create our server and define some tools
server = MCPToolRegistry("colab-mcp-demo")

@server.tool
def execute_code(code: str) -> str:
    """Execute Python code in the runtime kernel and return output."""
    import io, contextlib
    buf = io.StringIO()
    try:
        with contextlib.redirect_stdout(buf):
            exec(code, {"__builtins__": __builtins__})
        output = buf.getvalue()
        return output if output else "(no output)"
    except Exception as e:
        return f"Error: {type(e).__name__}: {e}"

@server.tool
def add_code_cell(code: str, cell_index: int) -> str:
    """Add a code cell to the notebook at the specified index."""
    return json.dumps({
        "status": "success",
        "action": "add_code_cell",
        "cell_index": cell_index,
        "preview": code[:80] + ("..." if len(code) > 80 else ""),
    })

@server.tool
def add_text_cell(content: str, cell_index: int) -> str:
    """Add a markdown cell to the notebook at the specified index."""
    return json.dumps({
        "status": "success",
        "action": "add_text_cell",
        "cell_index": cell_index,
        "preview": content[:80] + ("..." if len(content) > 80 else ""),
    })

@server.tool
def get_cells(cell_index_start: int, include_outputs: bool) -> str:
    """Retrieve cells from the notebook starting at the given index."""
    return json.dumps({
        "cells": [
            {"cell_type": "code", "id": "cell_0", "source": ["import pandas as pd"]},
            {"cell_type": "markdown", "id": "cell_1", "source": ["# Analysis"]},
        ]
    })

# Let's see what tools we made
print(" Registered MCP Tools:")
print("=" * 60)
for tool in server.list_tools():
    print(f"\n {tool['name']}")
    print(f"  Description: {tool['description']}")
    params = tool['inputSchema']['properties']
    for pname, pinfo in params.items():
        print(f"  Param: {pname} ({pinfo['type']})")

# And now let's test them
print("\n\n Calling Tools:")
print("=" * 60)
async def demo_tool_calls():
    result = await server.call_tool("execute_code", {
        "code": "print('Hello from the MCP runtime!')\nprint(2 + 2)"
    })
    print(f"\nexecute_code result:\n{result}")

    result = await server.call_tool("add_code_cell", {
        "code": "import matplotlib.pyplot as plt\nplt.plot([1,2,3],[1,4,9])\nplt.show()",
        "cell_index": 0,
    })
    print(f"\nadd_code_cell result:\n{result}")

    result = await server.call_tool("get_cells", {
        "cell_index_start": 0,
        "include_outputs": False,
    })
    print(f"\nget_cells result:\n{result}")

# This part just helps run async code in a notebook
try:
    import nest_asyncio
    nest_asyncio.apply()
except ImportError:
    subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "nest_asyncio"])
    import nest_asyncio
    nest_asyncio.apply()

asyncio.run(demo_tool_calls())

See? We just created a system where we can define simple Python functions, and our MCPToolRegistry automatically turns them into structured "tools" that an AI can understand and call. We even tested it by calling them ourselves. This is the fundamental building block.

Graduating to the Real Deal: FastMCP and Session Mode

Okay, our go-kart works. Now it's time to get behind the wheel of the real car: FastMCP. This is the high-performance framework that colab-mcp is actually built on.

We're also going to simulate the first of colab-mcp's two modes: Session Proxy Mode.

Think of it this way: the AI agent is sitting in a control room, and your Colab notebook is open in your browser on your desk. The Session Proxy is like a secure WebSocket "phone line" that connects the two. The AI tells the colab-mcp server, "add a code cell," and the server sends that message over the phone line to your browser, which then performs the action right in front of you.

Let's simulate this whole setup. We'll create a FastMCP server with some proxy-style tools and then build a fake WebSocket server to act as the "phone line."

from fastmcp import FastMCP
import asyncio
import json
import secrets
import websockets
from websockets.asyncio.server import serve as ws_serve
import nest_asyncio
nest_asyncio.apply()

mcp = FastMCP("colab-mcp-tutorial")

# These are tools for the "Session Proxy" mode
@mcp.tool()
def proxy_get_cells(cell_index_start: int = 0, include_outputs: bool = True) -> dict:
    """Get notebook cells from the connected Colab frontend."""
    # (In a real scenario, this would fetch from the browser)
    return { "cells": [ { "cell_type": "code", "id": "abc123", "source": ["import numpy as np\n"], "outputs": [] } ] }

@mcp.tool()
def proxy_add_code_cell(cell_index: int, code: str) -> dict:
    """Add a new code cell to the notebook at the specified position."""
    return {"status": "ok", "cell_index": cell_index}

@mcp.tool()
def proxy_execute_cell(cell_index: int) -> dict:
    """Execute the cell at the specified index in the connected notebook."""
    return {"status": "ok", "cell_index": cell_index, "execution_count": 1}

# This is a tool for the "Runtime Mode" which we'll explore next
@mcp.tool()
def runtime_execute_code(code: str) -> dict:
    """Execute Python code directly in a Colab kernel (Runtime Mode)."""
    # (We'll build a better version of this soon)
    return { "outputs": [ {"output_type": "stream", "name": "stdout", "text": "Hello from runtime!"} ] }

# A little server to simulate the Colab frontend listening for commands
class SimulatedColabWebSocketServer:
    def __init__(self, host: str = "localhost", port: int = 0):
        self.host = host
        self.port = port
        self.token = secrets.token_hex(16)
        self._server = None
        self._messages_received: list[dict] = []

    async def _handler(self, websocket):
        try:
            # First message must be for authentication
            auth_msg = await asyncio.wait_for(websocket.recv(), timeout=10.0)
            auth_data = json.loads(auth_msg)
            if auth_data.get("token") != self.token:
                await websocket.close()
                return

            await websocket.send(json.dumps({"status": "authenticated"}))
            print(f"  Client authenticated!")

            # Listen for tool calls
            async for message in websocket:
                data = json.loads(message)
                self._messages_received.append(data)
                print(f"  Received: {data.get('method', 'unknown')}")
                response = {"jsonrpc": "2.0", "id": data.get("id"), "result": {"status": "ok"}}
                await websocket.send(json.dumps(response))
        except (websockets.exceptions.ConnectionClosed, asyncio.TimeoutError):
            print("  Connection issue.")

    async def start(self):
        self._server = await ws_serve(self._handler, self.host, self.port)
        self.port = self._server.sockets[0].getsockname()[1]
        print(f"  WebSocket server running on ws://{self.host}:{self.port}")
        return self

    async def stop(self):
        if self._server:
            self._server.close()
            await self._server.wait_closed()
            print("  WebSocket server stopped")

# A fake browser client that connects to our server
async def simulate_browser_client(port: int, token: str):
    uri = f"ws://localhost:{port}"
    async with websockets.connect(uri) as ws:
        await ws.send(json.dumps({"token": token}))
        await ws.recv() # Wait for auth confirmation

        # Now, send a couple of tool calls
        await ws.send(json.dumps({"jsonrpc": "2.0", "id": 1, "method": "add_code_cell", "params": {"cellIndex": 0, "code": "print('hi')"}}))
        await ws.recv()
        await ws.send(json.dumps({"jsonrpc": "2.0", "id": 2, "method": "execute_cell", "params": {"cellIndex": 0}}))
        await ws.recv()

async def run_websocket_demo():
    print(" WebSocket Bridge Demo (Session Proxy Mode)")
    print("=" * 60)
    print("\n Starting WebSocket server...")
    wss = SimulatedColabWebSocketServer()
    await wss.start()

    print("\n Simulating browser frontend connection...")
    await simulate_browser_client(wss.port, wss.token)

    print(f"\n Server received {len(wss._messages_received)} tool calls")
    await wss.stop()
    print("\n Demo complete!")

asyncio.run(run_websocket_demo())

We just ran a full end-to-end simulation! We spun up a server, had a client connect to it with a security token, and sent tool calls over the wire. This is exactly how the agent communicates with your browser in Session Proxy mode.

Going Headless: Direct Kernel Execution in Runtime Mode

Session Proxy mode is great for interactive work where you want to see the notebook being built. But for pure automation, we want something more direct. That's Runtime Mode.

In this mode, the AI agent gets a direct, behind-the-scenes connection to the Colab execution environment (the "kernel"). It doesn't need a browser open. It can just send code straight to the machine and get results back. This is perfect for running jobs automatically.

A key feature here is persistent state. When the agent runs x = 5 in one command, it can then run print(x * 2) in the next command, and the kernel will remember that x is 5. It's like having a continuous conversation.

Let's build a simulator for this. Our ColabRuntimeSimulator will act like a real Colab kernel, keeping track of variables between calls.

import uuid
from dataclasses import dataclass, field

@dataclass
class KernelOutput:
    output_type: str
    text: str = ""

@dataclass
class ExecutionResult:
    success: bool
    outputs: list[KernelOutput]
    execution_count: int

class ColabRuntimeSimulator:
    def __init__(self):
        self._execution_count = 0
        self._namespace: dict = {"__builtins__": __builtins__} # This is where variables live
        self._is_started = False

    async def start(self):
        if self._is_started: return
        print("  Initializing runtime...")
        await asyncio.sleep(0.1) # Simulate startup time
        self._is_started = True
        print("  Runtime started!")

    async def execute_code(self, code: str) -> ExecutionResult:
        if not self._is_started:
            await self.start()

        self._execution_count += 1
        outputs: list[KernelOutput] = []
        stdout_buf = io.StringIO()

        try:
            with contextlib.redirect_stdout(stdout_buf):
                # Try to eval first (for expressions like '2+2'), then exec
                try:
                    result = eval(code, self._namespace)
                    if result is not None:
                        outputs.append(KernelOutput(output_type="execute_result", text=repr(result)))
                except SyntaxError:
                    exec(code, self._namespace)

            stdout_text = stdout_buf.getvalue()
            if stdout_text:
                outputs.append(KernelOutput(output_type="stream", text=stdout_text))

            return ExecutionResult(success=True, outputs=outputs, execution_count=self._execution_count)
        except Exception as e:
            outputs.append(KernelOutput(output_type="error", text=f"{type(e).__name__}: {e}"))
            return ExecutionResult(success=False, outputs=outputs, execution_count=self._execution_count)

async def runtime_demo():
    print(" Runtime Mode Demo")
    print("=" * 60)
    runtime = ColabRuntimeSimulator()

    code_snippets = [
        "import random\ndata = [random.randint(0, 100) for _ in range(10)]\nprint('Data generated.')",
        "mean = sum(data) / len(data)\nprint(f'Mean: {mean}')",
        "len(data)", # An expression
        "undefined_variable + 1", # An error
    ]

    for i, code in enumerate(code_snippets):
        print(f"\n--- Executing cell [{i+1}] ---")
        print(f"  Code: {code}")
        result = await runtime.execute_code(code)
        status = "Success" if result.success else "Error"
        print(f"  Status: {status}")
        for out in result.outputs:
            print(f"    -> {out.text.strip()}")

asyncio.run(runtime_demo())

Look at that! We defined data in the first snippet, and the second snippet was able to use it to calculate the mean. Then we triggered an error on purpose, and it handled it gracefully. This is the power of Runtime Mode: a persistent, stateful environment for your AI agent.

Putting It All Together: The AI Agent Loop

We've built the components. Now, let's assemble the brain. An AI agent operates in a simple but powerful loop:

  1. Plan: Based on the user's goal, decide which tool to use next.
  2. Act: Call the chosen tool with the right arguments.
  3. Observe: Look at the result of the tool call.
  4. Repeat: Go back to step 1 with this new information.

We'll create a simple MCPAgentLoop that simulates this. Instead of connecting to a real LLM, we'll hard-code the "plan" for each step to keep it simple, but this mimics the exact logic a real agent like Claude or Gemini would use.

class NotebookState:
    # A simplified class to manage the state of our notebook
    def __init__(self):
        self.cells: list[dict] = []
        self.execution_ns: dict = {"__builtins__": __builtins__}

    def add_code_cell(self, index: int, code: str):
        cell = {"type": "code", "source": code}
        self.cells.insert(index, cell)
        return {"status": "ok"}

    def execute_code(self, code: str) -> dict:
        # This is a simplified version of our runtime simulator's logic
        stdout_buf = io.StringIO()
        try:
            with contextlib.redirect_stdout(stdout_buf):
                exec(code, self.execution_ns)
            out = stdout_buf.getvalue()
            return {"outputs": [{"type": "stdout", "text": out}] if out else []}
        except Exception as e:
            return {"outputs": [{"type": "error", "text": f"{type(e).__name__}: {e}"}]}

class MCPAgentLoop:
    def __init__(self):
        self.notebook = NotebookState()
        self.max_iterations = 4

    def _dispatch_tool(self, name: str, args: dict) -> dict:
        # Simple router to call the right notebook function
        if name == "add_code_cell":
            return self.notebook.add_code_cell(args["cell_index"], args["code"])
        elif name == "execute_code":
            return self.notebook.execute_code(args["code"])
        else:
            return {"error": "Unknown tool"}

    def _plan(self, iteration: int) -> list[dict]:
        # This is our "fake LLM" that decides what to do next
        if iteration == 0:
            return [{"tool": "add_code_cell", "args": {"cell_index": 0, "code": "import pandas as pd"}}]
        elif iteration == 1:
            return [{"tool": "execute_code", "args": {"code": "import pandas as pd"}}]
        elif iteration == 2:
            return [{"tool": "add_code_cell", "args": {"cell_index": 1, "code": "df = pd.DataFrame({'a': [1,2], 'b': [3,4]})"}}]
        elif iteration == 3:
            return [{"tool": "execute_code", "args": {"code": "print(df.head())"}}]
        else:
            return []

    async def run(self, task: str):
        print(f" Agent Task: {task}")
        print("=" * 60)

        for i in range(self.max_iterations):
            plan = self._plan(i)
            if not plan:
                print(f"\n Agent finished after {i} iterations.")
                break

            print(f"\n--- Iteration {i+1} ---")
            for step in plan:
                tool_name = step["tool"]
                tool_args = step["args"]
                print(f"  Calling: {tool_name}({tool_args})")
                result = self._dispatch_tool(tool_name, tool_args)
                if "outputs" in result and result["outputs"]:
                    print(f"    -> Result: {result['outputs'][0]['text'].strip()}")

agent = MCPAgentLoop()
asyncio.run(agent.run("Create and display a pandas DataFrame"))

We just watched our agent build a notebook! It added a cell, executed it to import pandas, added another cell to create a DataFrame, and then executed that to print the head. This is the core loop of all modern AI agents.

From Demo to Deployment: Making it Robust

Everything we've done so far is great for a demo, but the real world is messy. Network connections drop. Code hangs. GPUs run out of memory. A production-ready agent needs to handle this stuff.

This is where orchestration comes in. We need to build a manager that adds a layer of resilience around our execution engine. Let's build a RobustNotebookOrchestrator that includes:

  • Automatic Retries: If a command fails with a temporary error (like a network glitch), it will automatically try again.
  • Timeouts: If a cell takes too long to run, it will be killed to prevent the whole process from hanging.
  • Smart Sequencing: If a cell fails, it will skip the cells that depend on it.
import time
from enum import Enum

class ExecutionStatus(Enum):
    SUCCESS = "success"
    ERROR = "error"
    TIMEOUT = "timeout"
    SKIPPED = "skipped"

class RobustNotebookOrchestrator:
    def __init__(self, max_retries: int = 2, timeout_seconds: float = 5.0):
        self.max_retries = max_retries
        self.timeout_seconds = timeout_seconds
        self.runtime = ColabRuntimeSimulator() # Using our simulator from before

    async def execute_notebook(self, cells: list[dict]):
        print(" Executing notebook with Robust Orchestrator...")
        print("=" * 50)
        failed_previously = False

        for i, cell in enumerate(cells):
            if failed_previously:
                print(f" [{i}] Skipped (previous cell failed)")
                continue

            print(f" [{i}] Executing: {cell['source'][:40]}...")
            start_time = time.time()
            status = ExecutionStatus.SUCCESS
            output = ""

            try:
                # Wrap the execution call in a timeout
                result = await asyncio.wait_for(
                    self.runtime.execute_code(cell['source']),
                    timeout=self.timeout_seconds
                )
                if not result.success:
                    status = ExecutionStatus.ERROR
                    failed_previously = True
                    output = result.outputs[0].text if result.outputs else "Unknown Error"
                else:
                    output = result.outputs[0].text if result.outputs else "(no output)"

            except asyncio.TimeoutError:
                status = ExecutionStatus.TIMEOUT
                failed_previously = True
                output = f"Timeout after {self.timeout_seconds}s"

            duration_ms = (time.time() - start_time) * 1000
            print(f"   -> {status.value} in {duration_ms:.0f}ms. Output: {output.strip()}")

async def advanced_demo():
    orchestrator = RobustNotebookOrchestrator()
    notebook_cells = [
        {"source": "x = 10"},
        {"source": "y = x * 2"},
        {"source": "print(z) # This will fail"},
        {"source": "print('This should be skipped')"},
    ]
    await orchestrator.execute_notebook(notebook_cells)

asyncio.run(advanced_demo())

Perfect. The first two cells succeeded, the third one failed as expected, and the orchestrator was smart enough to skip the fourth cell because of the failure. This is the kind of robust logic you need for real-world automation.

You're Ready to Go!

And that's it! We've gone from the basic principles of an MCP server all the way to a production-ready orchestration strategy. You now have a deep, hands-on understanding of how colab-mcp turns Google Colab into a powerful, programmable tool for AI agents.

The best part? You don't have to build all this from scratch. To get started with the real thing on your own machine, it's incredibly simple. You just need to install the server and tell your AI agent where to find it.

If you want to dive in, check out the official colab-mcp repository on GitHub. The patterns we've explored here—the tool definitions, the session and runtime modes, and the agent loop—are exactly what you'll find, just with more polish.

The ability for AI to not just write code, but to execute it, debug it, and build entire projects within our existing tools is a massive leap forward. It’s about moving from a simple chatbot to a true digital colleague. Now you know exactly how it's done.

Tags

Google AI AI System Design Developer Tools Software Development AI Assistant AI Productivity AI agents AI Workflow Automation Google Colab autonomous AI LLM Agents Code Execution AI Colab-MCP Python automation kernel execution open

Stay Updated

Get the latest articles and insights delivered straight to your inbox.

We respect your privacy. Unsubscribe at any time.

Aicosoft

AI & Technology News, Insights & Innovation

AICOSOFT delivers cutting-edge AI news, technology breakthroughs, and innovation insights. Stay informed about artificial intelligence, machine learning, robotics, and the latest tech trends shaping tomorrow.

Connect With Us

© 2026 Aicosoft. All rights reserved.