Have you ever had a great conversation with an AI, only to realize five minutes later that it’s completely forgotten everything you just told it? It’s a common frustration. Most AI models are brilliant in the moment, but they suffer from a serious case of digital amnesia. They lack a real, persistent memory.
This is one of the biggest hurdles holding back truly autonomous AI agents. How can an agent be a reliable assistant if it can't remember your preferences, project details, or what you discussed yesterday?
Well, what if we could build an agent with a better brain? A brain that combines two different kinds of memory to recall information with incredible accuracy. That’s exactly what we’re going to do today. We're going to build a hybrid-memory autonomous agent from the ground up. Think of it as giving our AI a long-term memory that’s both smart and searchable.
Let's get started.
First, Let's Sketch Out the Blueprint
Before we start writing any code, good architecture is everything. A messy, tangled system is a nightmare to debug and impossible to upgrade. We want something clean, modular, and easy to understand.
Imagine building with LEGOs. You have different types of bricks (the "interfaces") that define how pieces connect, and then you have the specific pieces themselves (the "implementations"). This means you can easily swap a red 2x4 brick for a blue one without having to rebuild your entire castle.
That's our approach here. We’ll define a few simple "blueprints" or contracts:
MemoryBackend: This is the rulebook for any memory system we want to plug in. It just needs to know how tostoreinformation andsearchfor it.LLMProvider: This is our contract for the language model. Any LLM we use must have acompletemethod that can take a conversation and give us a response.Tool: This defines what a "tool" looks like. Every tool needs aname, adescriptionof what it does, and arunmethod to actually execute it.
By starting with these simple, abstract rules, we're building a flexible foundation. We can swap out our memory system, change our LLM provider, or add new tools later without breaking everything.
Building the Agent's Brain: A Hybrid Memory System
Okay, now for the cool part: the memory. This isn't just a simple database. We're building a hybrid memory system that gets the best of two worlds.
What does that mean? Think about how you find information. Sometimes, you’re looking for a general vibe or concept. Other times, you need to find an exact word or phrase. These are two different kinds of searching.
Our agent's brain will do both at the same time:
-
Semantic Search (The Vibe): This is for understanding meaning and context. We use something called "embeddings" to turn text into a list of numbers (a vector) that captures its meaning. When you search, it looks for memories with a similar "vibe" or vector. This is great for finding conceptually related ideas, even if the words are different. We use OpenAI's
text-embedding-3-smallmodel for this. -
Keyword Search (The Specifics): This is the old-school search you know and love. It’s perfect for finding exact matches, like a product ID, a specific name, or a keyword. We use a classic algorithm called BM25 for this. It’s fast, efficient, and fantastic at finding needles in a haystack.
So, how do we combine them? We use a clever technique called Reciprocal Rank Fusion (RRF). It sounds complicated, but the idea is simple. We perform both searches (semantic and keyword) and get two separate ranked lists of results. RRF then looks at both lists and creates a new, single score that prioritizes items appearing high up on either list. It's a "best of both worlds" approach that gives us incredibly relevant results.
Our HybridMemory class handles all of this under the hood. When you ask it to store something, it creates the vector embedding and updates its keyword index. When you search, it runs both queries, fuses the results with RRF, and gives you a single, beautifully ranked list of memories.
Giving Our Agent Senses and Hands: The LLM and Tools
An agent with a great memory is nice, but it's useless if it can't think or act. That's where our LLM and tools come in.
First, we need a way to talk to OpenAI. Our OpenAIProvider class is a simple wrapper around the OpenAI API. It takes our conversation history, sends it to a model like gpt-4o-mini, and neatly formats the response. This keeps the main agent code clean and unaware of the specific API details.
Next, we need to give our agent some skills. These are our "tools." For this project, we'll build a few essential ones:
MemoryStoreTool&MemorySearchTool: These are the agent's hands for its own brain. It can consciously decide to save a new piece of information to its long-term memory or search for something it might have learned before.CalculatorTool: A simple tool that can evaluate mathematical expressions. This way, the agent doesn't have to "guess" at math; it can get a precise answer.WebSnippetTool: A simulated web search tool. In a real-world application, you'd hook this up to a proper search API, but for our demo, it's a simple lookup.
Each tool has a schema that tells the LLM what it's called, what it does, and what inputs it needs. This is crucial because it allows the LLM to decide on its own which tool to use and how to use it.
Crafting a Personality
We don't want a generic, robotic assistant. We want an agent with a consistent personality. The AgentPersona class helps us define this.
Here, we can give our agent a name (we'll call her "Aria"), a role, a list of core traits (like "Methodical" and "Curious"), and even a list of forbidden phrases (like the classic "As an AI language model...").
This class dynamically builds a detailed system prompt for every conversation. It’s like giving an actor their character sheet before they go on stage. It ensures Aria behaves consistently, follows her goals, and maintains her personality no matter what you ask her.
The Heart of the Machine: The Autonomous Loop
Now we have all the pieces: a brain, a connection to an LLM, a set of tools, and a personality. The AutonomousAgent class is the conductor that brings this whole orchestra together.
Here’s how it works, in what's often called an "agentic loop":
- Listen: You give the agent a message, like "What's the deadline for the VelocityDB project?"
- Recall: Before thinking, the agent automatically does a quick search of its long-term memory for anything relevant to your query. It adds these memories as "Live Context" to its prompt.
- Think: The agent sends the whole conversation—including the system prompt, chat history, your new message, and the recalled memories—to the LLM. It also tells the LLM about the tools it has available.
- Act (or Respond): The LLM then makes a choice.
- If it needs more information, it will decide to use a tool. It might say, "I should use the
memory_searchtool to find information about 'VelocityDB'." The agent code sees this, runs the tool, gets the result, and feeds that result back into the loop. This can happen multiple times in a row. - If it has enough information to answer, it will generate a final, plain-text response.
- If it needs more information, it will decide to use a tool. It might say, "I should use the
- Reply: The agent gives you the final answer. The whole conversation (including the tool use) is saved, so it has context for your next question.
This loop of recalling, thinking, and acting is what makes the agent feel autonomous. It’s not just responding; it’s reasoning about how to get you the best possible answer.
Let's See Aria in Action!
Theory is great, but let's see how this all plays out in practice.
Demo 1: Seeding the Memory
First, we give Aria some initial knowledge. We use the memory.store function to feed it facts about a fictional user named Alice and her project, "VelocityDB." We store things like her favorite programming language, project deadlines, and meeting schedules.
Demo 2: Testing the Hybrid Search Next, we test the memory directly. We throw a few different types of queries at it:
- A specific question: "What consensus algorithm does VelocityDB use?" (This is good for semantic search).
- A keyword-heavy query: "order 4821" (Perfect for BM25).
- A fuzzy query: "Alice's language preference" (A mix of both).
In each case, you can see how the RRF score combines the cosine (semantic) and BM25 (keyword) scores to pull up the most relevant memory chunk. It just works.
Demo 3: A Real Conversation Now, the real test. We chat with Aria.
- We ask about Alice's project. Aria uses the
memory_searchtool to find the relevant facts and combines them into a perfect summary. - We ask a math question related to the project deadline. Aria recognizes this, calls the
calculatortool, and gives us the exact number of hours left. - We give Aria a new piece of information: "Alice just decided to switch the storage engine... Please remember this." Aria understands the intent and uses the
memory_storetool to save this new fact. When we ask about it later, she recalls it perfectly.
Demo 4: Upgrading the Agent on the Fly
This is where the modular design really shines. We create a new, UpgradedWebSnippetTool that knows about a new term ("lsm-tree"). We use the agent.register_tool() method to hot-swap the old tool with the new one, while the agent is running.
When we ask a question that requires this new knowledge, Aria seamlessly uses the upgraded tool without missing a beat. This is incredibly powerful for building systems that can evolve over time.
Why This Approach Is a Big Deal
We've just built a complete autonomous agent that is so much more than a simple chatbot. It can remember, reason, and use tools to accomplish tasks.
The modular design means we can easily swap out any part of it. Don't like OpenAI? Write a new LLMProvider for Anthropic's Claude or Google's Gemini. Need a better memory system? You can plug in a production-grade vector database without changing the agent's core logic.
This isn't just a fun academic exercise. It's a practical blueprint for building AI agents that can function as genuine partners—remembering context, learning over time, and taking action to help you get things done. The era of the goldfish-brained AI is coming to an end.




