Building an AI That Learns: My Hands-On Test of OpenSpace's Self-Evolving Skills

Akram Chauhan
Akram Chauhan
11 min read101 views
Building an AI That Learns: My Hands-On Test of OpenSpace's Self-Evolving Skills

Have you ever felt like you're explaining the same thing to your AI assistant over and over again? It’s brilliant, for sure. But it has the memory of a goldfish. Every new task is Groundhog Day—it starts from scratch, with no memory of the almost-identical thing you asked it to do five minutes ago.

This isn't just annoying; it's expensive. Every time that AI reasons from first principles, it’s burning through tokens, and that costs you real money.

I’ve been in the tech and AI space for a while, and this has always felt like the big, unspoken problem with today's agents. They're incredibly smart but lack experience. So, when I heard about a project called OpenSpace from HKUDS that promised a "self-evolving skill engine," I was skeptical but intrigued. The idea? An AI that learns from every task, captures reusable patterns (or "skills"), and gets progressively cheaper and more effective over time.

It sounded too good to be true. So, I decided to roll up my sleeves and take it for a spin myself. I wanted to see if this was just a cool academic concept or something that could actually change how we work with AI.

Let's walk through what I found.

First Things First: Getting It All Set Up

Before we can see the magic, we have to plug everything in. Getting started was surprisingly straightforward. It's mostly about installing the OpenSpace package from GitHub and telling it about your OpenAI API key.

I always appreciate it when a tool lets you enter sensitive stuff like an API key securely, and OpenSpace does this well. You type it in, and it's hidden, so you don't accidentally save it in your notebook history.

Here’s the quick setup script I ran. It pulls down the necessary code, installs the OpenAI library, and then gets your keys.

import subprocess, sys, os
print(" Installing OpenSpace from GitHub (this may take 2-3 minutes)...")
subprocess.check_call([
    sys.executable, "-m", "pip", "install", "-q", "git+https://github.com/HKUDS/OpenSpace.git"
])
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "openai"])
print("\n Installation complete!")

try:
    from openspace import OpenSpace
    print(" OpenSpace imported successfully")
except ImportError as e:
    print(f" Import issue: {e}")
    print("Trying alternative import path...")
    import openspace
    print(f" openspace package found at: {openspace.__file__}")

import getpass

print("Enter your OpenAI API key (input is hidden):")
api_key = getpass.getpass("OpenAI API Key: ")
os.environ["OPENAI_API_KEY"] = api_key

print("\n[Optional] Enter your OpenSpace Cloud API key")
print("(Get one free at https://open-space.cloud — press Enter to skip):")
cloud_key = getpass.getpass("OpenSpace Cloud Key: ")
if cloud_key.strip():
    os.environ["OPENSPACE_API_KEY"] = cloud_key.strip()
    print(" Cloud API key set")
else:
    print(" Skipping cloud features (local mode only)")

MODEL_NAME = "openai/gpt-4o-mini"
os.environ["OPENSPACE_MODEL"] = MODEL_NAME

print(f"\n Configuration complete!")
print(f" Model: {MODEL_NAME}")
print(f" OpenAI Key: {'*' * 8}...{api_key[-4:]}")
print(f" Cloud: {'Enabled' if cloud_key.strip() else 'Disabled (local only)'}")

from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

try:
    test_resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Say 'OpenSpace ready!' in 3 words or less."}],
        max_tokens=10
    )
    print(f" OpenAI API working: {test_resp.choices[0].message.content}")
except Exception as e:
    print(f" OpenAI API error: {e}")
    print("Please check your API key and try again.")

Once that’s done, we just need a clean little workspace for our experiment—a place for skills to be saved, outputs to be generated, and a database to track everything.

The First Test: A "Cold Start" With No Skills

Alright, with everything set up, it was time for the first real test. I wanted to give the AI a typical data-crunching task, but under one condition: it had to start with zero pre-existing skills. This is what the OpenSpace team calls a "cold start."

Think of it like hiring a new junior developer and giving them their first-ever assignment. They have raw intelligence (the LLM) but no experience or shortcuts.

The task was to write a Python script to analyze a sales CSV, calculate monthly revenue, and find the top-selling products. A pretty standard request.

import os
import json
import shutil
import sqlite3
import glob
import asyncio
import time
from pathlib import Path

WORKSPACE = Path("/content/openspace_tutorial")
SKILLS_DIR = WORKSPACE / "skills"
OUTPUT_DIR = WORKSPACE / "outputs"
DB_DIR = WORKSPACE / ".openspace"

if WORKSPACE.exists():
    shutil.rmtree(WORKSPACE)
WORKSPACE.mkdir(parents=True)
SKILLS_DIR.mkdir(parents=True)
OUTPUT_DIR.mkdir(parents=True)
DB_DIR.mkdir(parents=True)
os.environ["OPENSPACE_WORKSPACE"] = str(WORKSPACE)
os.environ["OPENSPACE_HOST_SKILL_DIRS"] = str(SKILLS_DIR)

print(f" Workspace: {WORKSPACE}")
print(f" Skills: {SKILLS_DIR}")
print(f" Outputs: {OUTPUT_DIR}")
print(f" Database: {DB_DIR}")

async def run_cold_start_task():
    print("="*60)
    print(" COLD START: No skills exist yet")
    print("="*60)
    task = (
        "Create a Python script that analyzes a CSV file containing "
        "sales data with columns: date, product, quantity, price. "
        "The script should compute monthly revenue, identify the top "
        "3 best-selling products, and generate a summary report as "
        "a formatted text file."
    )
    print(f"\n Task: {task[:100]}...\n")
    start_time = time.time()
    try:
        from openspace import OpenSpace
        async with OpenSpace() as cs:
            result = await cs.execute(task)
        elapsed = time.time() - start_time
        print(f"\n Execution time: {elapsed:.1f}s")
        print(f"\n Response (first 500 chars):")
        print("-" * 40)
        response_text = result.get("response", str(result))
        print(response_text[:500])
        evolved = result.get("evolved_skills", [])
        if evolved:
            print(f"\n Skills Evolved: {len(evolved)}")
            for skill in evolved:
                origin = skill.get('origin', 'unknown')
                name = skill.get('name', 'unnamed')
                print(f" • {name} (origin: {origin})")
        else:
            print("\n No skills evolved yet (may happen post-analysis)")
        return result
    except Exception as e:
        print(f"\n Execution error: {type(e).__name__}: {e}")
        return None

cold_start_result = await run_cold_start_task()

The AI completed the task, which was expected. But the fascinating part happened after the execution. OpenSpace was watching. It analyzed the entire process—the prompts, the generated code, the final output—and identified patterns. It then automatically created and saved a new "skill" based on this task.

Peeking Under the Hood: What Did It Actually Learn?

This is where my inner geek got really excited. OpenSpace isn't a black box. You can actually go in and see the skills it creates. They're stored in a simple SQLite database and as plain text files (SKILL.md).

It's like looking at the AI's personal notes. You can see what it decided was important enough to remember for next time.

def inspect_skill_database():
    db_path = next(iter(glob.glob(str(DB_DIR / "*.db"))), None)
    if not db_path:
        print(" No skill database found yet.")
        return
    
    print(f" Found database: {db_path}")
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
    tables = [t[0] for t in cursor.fetchall()]
    print(f"\n Tables: {tables}")
    
    if 'skills' in tables:
        cursor.execute("SELECT name, origin, version FROM skills LIMIT 5")
        rows = cursor.fetchall()
        print("\n Recently evolved skills:")
        for row in rows:
            print(f" → Name: {row[0]}, Origin: {row[1]}, Version: {row[2]}")
    conn.close()

inspect_skill_database()

def inspect_skill_files():
    skill_files = list(SKILLS_DIR.rglob("SKILL.md"))
    if not skill_files:
        print("\n No skill files found on disk yet.")
        return
    
    print(f"\n Found {len(skill_files)} skill files on disk:\n")
    for sf in skill_files[:5]:
        rel_path = os.path.relpath(sf, WORKSPACE)
        print(f" {rel_path}")
        with open(sf, 'r') as fh:
            content = fh.read(300)
            print(f"   Preview: {content[:150].replace(chr(10), ' ')}...\n")

inspect_skill_files()

Looking through the files, I saw it had captured the essence of "CSV analysis with Python." It noted the libraries used (pandas), common steps (grouping by date), and the overall structure of the solution. This wasn't just saved code; it was a generalized, reusable strategy.

Round Two: The "Warm Start" and the Big Payoff

Now for the moment of truth. I gave the agent a similar but not identical task. This time, it was about analyzing an inventory CSV file to calculate monthly expenses.

Because the agent now had the "CSV analysis" skill in its back pocket, this was a "warm start." It didn't have to figure everything out from scratch. It could just pull out its notes, adapt the pattern, and go.

The result? The task was completed noticeably faster. But the real win was in the token count. Because the AI could rely on the pre-existing skill, the prompt it needed to send to the LLM was much shorter and more direct. It was less "how do I analyze a CSV?" and more "apply skill XYZ to this new file."

This is the economic game-changer. Fewer tokens mean lower API bills. Over hundreds or thousands of tasks, the savings are massive.

You're the Teacher: Seeding the AI with Your Own Expertise

What I found really powerful is that you don't have to wait for the AI to learn everything on its own. You can be the teacher.

If you're an expert in a certain area, you can create your own custom skills to give the agent a head start. For instance, I created a skill for "data validation," which includes battle-tested code for handling common CSV problems like weird character encodings or missing values.

You just write a simple SKILL.md file with instructions, triggers, and even code snippets.

def create_custom_skill(skill_name, description, instructions, triggers):
    skill_dir = SKILLS_DIR / skill_name
    skill_dir.mkdir(parents=True, exist_ok=True)
    skill_md = f"""---
name: {skill_name}
description: {description}
version: 1.0.0
origin: manual
triggers: {json.dumps(triggers)}
---
# {skill_name}
{description}

## Instructions
{instructions}
"""
    skill_path = skill_dir / "SKILL.md"
    skill_path.write_text(skill_md)
    print(f" Created custom skill: {skill_name}")
    return skill_path

# Example of creating a robust, production-ready skill
create_custom_skill(
    skill_name="data-validation-csv",
    description="Validate CSV files for common issues before processing.",
    instructions="""When working with CSV data:
1. **Encoding Detection**: Try UTF-8 first, then fall back to latin-1.
2. **Delimiter Detection**: Use csv.Sniffer() to auto-detect.
3. **Missing Values**: Count NaN/null per column and report percentage.
4. **Duplicate Check**: Identify and report duplicate rows.
```python
import pandas as pd
# (Sample validation code here)
```""",
    triggers=["csv", "data validation", "data quality", "pandas"]
)

This hybrid approach feels right. The AI learns autonomously from its work, but you can also guide it with your own hard-won knowledge.

The Power of Community: Sharing Brains in the Cloud

This is where OpenSpace goes from a cool personal tool to something with huge potential. It has a cloud component where agents can share the skills they've evolved.

Imagine your AI gets really good at generating legal documents. You can upload that skill to the cloud. Then, an agent working for a colleague across the country can download and use that skill instantly, without having to learn it from scratch.

It’s a framework for building collective intelligence. Each agent contributes to a shared "brain," making every other agent in the network smarter. This is how you scale expertise.

The Proof: Let's Talk Numbers

A 4.2x income improvement and a 46% token reduction.

Those are the headline numbers from the GDPVal benchmark, a test that ran OpenSpace across 50 real-world professional tasks. Compared to a baseline agent, the one using OpenSpace not only produced higher-quality work (measured as "income") but did so for about half the cost in tokens.

That’s staggering. It's concrete proof that this isn't just a theoretical benefit.

What's also fascinating is what the AI chose to learn. When they analyzed the 165 skills that were automatically evolved during the benchmark, the most common ones weren't about specific domains like "marketing" or "finance." They were about the messy realities of getting work done:

  • File Format I/O (44 skills): Dealing with different types of files.
  • Execution Recovery (29 skills): How to handle errors and retry when things fail.
  • Document Generation (26 skills): Creating reports, PDFs, etc.
  • Quality Assurance (23 skills): Checking its own work.

The AI was learning to be a resilient, practical problem-solver, not just a domain expert. It was learning how to work.

It Just Gets Better Over Time

To really see this in action, I ran a final test: a pipeline of three different tasks in a row. A CSV analyzer, then a report generator, then a data quality checker.

With each step, I could see the system's intelligence accumulating. The first task evolved a few skills. The second task reused those skills and evolved new ones. By the third task, it was a well-oiled machine, pulling from a growing library of its own past experiences. The number of reused skills just kept climbing.

This is the self-improving loop in action. The more you use it, the better and cheaper it gets.

My takeaway from all of this is pretty clear. The idea of a stateless AI that starts fresh every time is on its way out. The future is agents that learn, remember, and share experience. OpenSpace feels like a huge step in that direction. It transforms an AI from a brilliant-but-forgetful tool into a true digital colleague—one that gets better at its job every single day.

Tags

AI Engineering Cost Optimization AI Research AI Memory AI Productivity Persistent Memory Autonomous Agents Advanced AI AI Assistants System Design LLM Optimization self-evolving AI skill learning token efficiency collective intelligence OpenSpace machine

Stay Updated

Get the latest articles and insights delivered straight to your inbox.

We respect your privacy. Unsubscribe at any time.

Aicosoft

AI & Technology News, Insights & Innovation

AICOSOFT delivers cutting-edge AI news, technology breakthroughs, and innovation insights. Stay informed about artificial intelligence, machine learning, robotics, and the latest tech trends shaping tomorrow.

Connect With Us

© 2026 Aicosoft. All rights reserved.