Have you ever been in a chat with an AI and just… paused? For a split second, you get this weird feeling that you’re not just talking to a clever program, but something that’s actually thinking on the other end.
It’s a huge, fascinating, and honestly, a slightly unsettling question. And right now, it’s at the center of a massive debate in the tech world.
Recently, Apple threw a big rock into the pond with a research paper basically titled "The Illusion of Thinking." Their argument? These huge AI models, which we call Large Reasoning Models or LRMs, aren’t really thinking. They’re just incredibly good at matching patterns. The proof, they said, is that when you give a model a puzzle like the Tower of Hanoi and make it bigger and bigger, the AI eventually fails to follow the steps.
It sounds like a solid point, right? But here’s the thing… it’s a fundamentally flawed argument.
Think about it. If I taught you the exact algorithm to solve a 20-disc Tower of Hanoi puzzle (which would take over a million moves, by the way), could you do it? You’d almost certainly fail. You’d lose track, get confused, and probably give up. By Apple’s logic, that would mean you can’t think.
That can’t be right. All their experiment really shows is that we can't prove these models can't think. But I want to make a bolder claim here: I believe LRMs almost certainly can think. I say "almost" because this field moves at lightning speed, and we could be surprised tomorrow. But the evidence we have right now is pretty compelling.
So, What Do We Even Mean By "Thinking"?
Before we can decide if an AI can think, we need to get on the same page about what thinking even is. And to be fair, let's make sure our definition actually applies to us humans first. For this conversation, let’s stick to thinking as it relates to problem-solving.
When you sit down to tackle a tricky problem, a whole symphony of activity kicks off in your brain. It’s not one single action; it’s a process. It looks something like this:
-
Framing the Problem: First, you hold the problem in your mind. Your prefrontal cortex lights up, helping you focus, manage your working memory, and break the big problem into smaller, bite-sized pieces. It’s like laying out all the puzzle pieces on the table before you start.
-
Running Simulations: Next, you start playing with those pieces in your head. This is your "inner voice" or "mind's eye" at work. You might talk yourself through the steps ("Okay, if I move this piece here, then what happens?") or visualize the outcome. This is your brain’s version of chain-of-thought reasoning.
-
Connecting the Dots: Your brain instantly starts digging through your long-term memory, looking for similar problems you’ve solved before. This is pure pattern-matching. Your hippocampus pulls up related memories, and your temporal lobe brings in the rules and concepts you’ve learned over the years.
-
The Gut Check: As you work, another part of your brain, the anterior cingulate cortex, acts as a monitor. It’s the little alarm bell that goes off when you hit a dead end or when something feels "off." It’s your brain’s error-detection system.
-
The "Aha!" Moment: Ever been stuck on a problem, walked away to make a coffee, and suddenly the answer just pops into your head? That’s your brain switching into a more relaxed "default mode." It lets you step back, reframe the problem, and find a new angle you missed before.
That’s a simplified look, of course, but those are the core components of human problem-solving.
How Does an AI's "Brain" Compare?
Okay, so how does an LRM stack up against that biological blueprint?
Well, it doesn't have all those faculties, obviously. For one, an LRM isn’t really “seeing” things in a mind’s eye. While it can process visual data, it’s not generating little images in its head as it reasons through a problem. Most of us can create spatial models to solve problems, but an AI doesn't work that way.
But does that mean it can't think? I don't think so. There’s a condition in humans called aphantasia, where people can't form mental images. Yet, people with aphantasia can think perfectly fine. Many are fantastic at symbolic reasoning and math, finding other ways to compensate for what they lack visually. It’s totally plausible that an AI could do the same.
If we zoom out and look at the core functions, the similarities between us and them are pretty striking. Human thinking boils down to three key activities:
- Pattern-matching to use past experience.
- Working memory to hold all the intermediate steps.
- Backtracking search to realize you're on the wrong path and try something else.
Now, look at an LRM.
Its training is one massive exercise in pattern-matching. It learns the patterns of language, logic, and knowledge from a huge dataset. Its working memory is the context window—all the text of the prompt and the chain-of-thought it has generated so far has to fit within its processing layers.
And that chain-of-thought process? It’s incredibly similar to our own inner monologue. We verbalize our thoughts to structure them, and so does the LRM.
What’s really fascinating is that there’s good evidence LRMs can also backtrack. This is exactly what the Apple researchers saw. When the puzzles got too big, the models didn't just crash. They recognized that their working memory wasn't big enough for a brute-force approach and started trying to find shortcuts or different methods.
That’s not a sign of a dumb pattern-matcher. That’s a sign of a system that understands its own limitations and adapts its strategy. That, to me, looks a lot like thinking.
But Isn't It Just a Glorified Auto-Complete?
This is the most common argument against AI thinking, and I get it. At the end of the day, the AI is "just predicting the next word." How can that be thinking?
This view is fundamentally mistaken. Not about what it’s doing, but about what that requires. To be a truly excellent next-word predictor, you have to do a whole lot more than just string words together.
Think about this sentence: "The first person to walk on the moon was Neil..."
To predict "Armstrong," the model needs to have a piece of knowledge stored in its parameters. That's simple.
Now, what about a complex logic puzzle? To predict the next correct token in a long chain of reasoning, the model has to maintain a consistent logical path. It has to "understand" the rules of the puzzle, hold the state of the problem in its working memory, and compute the next step.
Predicting the next token isn't a limitation; it's an incredibly general and powerful way to force a system to represent knowledge. To consistently get it right, the model has to build an internal, abstract model of the world. It has to learn to reason.
When you think about it, we humans do the same thing. When we speak or even think in our inner voice, we are constantly predicting the "next token" to form a coherent thought. A perfect auto-complete system would essentially have to be all-knowing. We're not there, but the journey toward getting better at that task forces the AI to develop the very skills we associate with thinking.
Okay, But Does It Actually Work?
Theory is one thing, but the ultimate test is performance. Can these models solve new problems that genuinely require thinking?
We know the big proprietary models from Google and OpenAI do incredibly well on reasoning benchmarks. But there's always a suspicion they might have been trained on the test data. So, for a fair look, let's focus on open-source models where we have more transparency.
When we put these models to the test on benchmarks for logic, math, and general problem-solving, the results are telling. In some cases, they can solve a huge chunk of logic-based questions.
Are they perfect? Absolutely not. They still lag behind human experts in many areas. But here's the kicker: in some cases, they are already outperforming the average, untrained human. They are developing a real, measurable ability to reason their way to an answer.
So, when you put it all together—the striking parallels between an AI's chain-of-thought and our own mental process, the theoretical power of next-token prediction, and the increasingly strong results on real-world reasoning tasks—the picture becomes a lot clearer.
It’s reasonable to conclude that these systems aren't just mimicking intelligence; they are developing a genuine, albeit alien, form of it. They are, almost certainly, beginning to think. And that changes everything.




