Google's New AI Writes Its Own Code—And It's Already Better Than Human Experts

Akram Chauhan
Akram Chauhan
6 min read90 views
Google's New AI Writes Its Own Code—And It's Already Better Than Human Experts

Have you ever tried to perfect a recipe? You start with a base, maybe from a cookbook, then you tweak it. A little more salt here, a little less sugar there. You taste, you adjust, you repeat. It's a slow process of trial, error, and intuition.

For decades, that’s pretty much how experts have designed complex algorithms for things like game theory—the math behind strategic thinking in poker, negotiations, and more. It was a manual, painstaking process guided by human intuition.

Well, Google DeepMind just threw that cookbook out the window.

They've built a system that lets an AI do the tweaking itself. Not by adjusting a few numbers, but by literally rewriting its own source code. And the new "recipes" it’s cooking up are already outperforming the ones created by the world's top experts.

It’s called AlphaEvolve, and it’s a fascinating glimpse into a future where AI doesn’t just help us find answers, but helps us discover entirely new ways of asking the questions.

So, How Do You Get an AI to Evolve Code?

Let me break down how AlphaEvolve works, because it’s both brilliant and surprisingly simple in concept. Think of it like Darwinian evolution, but for Python code.

  1. Start with a Population: The process begins with a "population" of existing algorithms. They took well-known, human-designed algorithms as the starting seed—the "Adam and Eve" of their digital ecosystem.
  2. Select the Fittest: They test these algorithms against each other in a few simple games (like a simplified version of poker). The ones that perform best are deemed the "fittest" and get to "reproduce."
  3. Mutate with an LLM: This is the magic step. When an algorithm is chosen to reproduce, its source code is fed to a large language model (in this case, Gemini 2.5 Pro). The LLM is given a simple prompt: "mutate this code to make it better." The LLM then acts like a random genetic mutation, rewriting parts of the code to create a new "offspring" algorithm.
  4. Test and Repeat: This new, mutated algorithm is then thrown into the arena and tested. If it's any good, it gets added to the population. This cycle of selection, mutation, and testing repeats, generation after generation.

Over time, the algorithms evolve. Bad ideas die off, and clever, effective new strategies emerge and get refined. The AI is essentially running a high-speed, automated search for brilliant new ideas, exploring a space of possibilities that no human would have the time or creativity to navigate.

The team focused this evolutionary pressure on two big families of game theory algorithms: CFR and PSRO. You don't need to know the nitty-gritty, just that they are foundational tools for teaching AIs to play "imperfect-information" games like poker, where you can't see your opponent's cards.

And what AlphaEvolve discovered is pretty wild.

Meet VAD-CFR: The AI's First Weird, Brilliant Creation

The first major discovery came from evolving the CFR algorithm family. The AI-generated winner was named VAD-CFR, which stands for Volatility-Adaptive Discounted CFR. And it does things a human programmer probably never would have tried.

Human-designed algorithms in this family tend to use fixed rules for "forgetting" past mistakes. VAD-CFR, on the other hand, is dynamic.

  • It watches for chaos: The algorithm tracks how "volatile" the learning process is. If things are changing rapidly and seem unstable, it increases its "discounting"—basically, it forgets bad past moves more quickly. When things are stable, it remembers more of its history. It adapts on the fly.
  • It's an optimist: It gives a tiny boost (multiplying by 1.1, to be exact) to actions that look good right now. This makes it more reactive and quicker to pounce on a good opportunity.
  • It does something really strange: This is the part that made me do a double-take. For the first 500 rounds of the game, it doesn't average its strategy at all. It just learns and accumulates regrets. Then, at round 501, it starts averaging. Why 500? No one knows. The AI was evaluated on a 1000-round game, and it just decided that waiting until the halfway point was the optimal time to start building its final strategy.

This isn't the kind of elegant, simple rule a human would come up with. It’s a quirky, specific, and slightly bizarre solution that just… works.

How well does it work? VAD-CFR matched or beat the best human-designed algorithms in 10 out of 11 test games. It generalized its weird strategy to bigger, more complex games it had never seen during its evolution.

And Then Came SHOR-PSRO: An AI That Masters Pacing

The second discovery came from the PSRO family of algorithms. This new creation, SHOR-PSRO, tackled a classic dilemma in learning: when do you explore new options versus when do you exploit the best options you've already found?

Think about learning a new video game. At first, you run around the map, trying different weapons and strategies (exploration). Eventually, you figure out the best loadout and the best routes, and you stick with them to win (exploitation). The trick is knowing when to make that switch.

SHOR-PSRO automates this. It’s a hybrid algorithm that literally blends two different mindsets:

  • One part is stable and focused on finding a solid, balanced equilibrium (like a good poker player who doesn’t get too wild).
  • The other part is greedy and just wants to find the single best move right now (like a player going all-in on a hunch).

At the beginning of its training, the algorithm leans more towards the greedy, exploratory side. But as training progresses, it automatically and smoothly shifts its focus towards the stable, equilibrium-finding side. It even uses different settings for when it's training versus when it's being evaluated, another clever trick it discovered on its own.

The result? SHOR-PSRO matched or beat the state-of-the-art in 8 out of 11 test games, proving it’s another incredibly effective, AI-native strategy.

Why This Is More Than Just a Better Poker Bot

Okay, so an AI found some new algorithms for playing games. Cool. But the real takeaway here is much, much bigger.

This research shows a path toward automating scientific discovery itself. Instead of humans staring at a problem and trying to dream up a solution, we can build systems that search for solutions in a vast, unexplored space of possibilities.

The algorithms AlphaEvolve found are not just better; they're different. They have a kind of alien intelligence to them. The hard-coded "wait 500 iterations" rule in VAD-CFR is a perfect example. A human would have tried to find a general, elegant principle. The AI just found what worked.

It suggests that for some of our hardest problems, the best solutions might be messy, non-intuitive, and a little bit strange. And we might need AI to help us find them. We're moving from a world where we use AI as a tool to one where we use it as a creative partner—one with a very different, and potentially very powerful, way of thinking.

Tags

Machine Learning Deep Learning Google AI LLMs AI System Design AI Capabilities Reinforcement Learning Software Development Self-improving AI Emerging Technologies AI Performance Optimization AI Algorithms Advanced AI AI innovation Google DeepMind AI Code Generation Game Theory AI AlphaEvolve Strategic Thinking AI AI Discovery

Stay Updated

Get the latest articles and insights delivered straight to your inbox.

We respect your privacy. Unsubscribe at any time.

Aicosoft

AI & Technology News, Insights & Innovation

AICOSOFT delivers cutting-edge AI news, technology breakthroughs, and innovation insights. Stay informed about artificial intelligence, machine learning, robotics, and the latest tech trends shaping tomorrow.

Connect With Us

© 2026 Aicosoft. All rights reserved.