Have you ever heard a claim so big, so wild, that your first reaction is, "Yeah, right"?
That’s pretty much how the entire AI world reacted when a tiny Miami-based startup called Subquadratic popped up out of nowhere last month. They came out swinging with a claim that sounded almost impossible: they said they’d solved a fundamental mathematical problem that’s been kneecapping large language models (LLMs) for the better part of a decade.
The initial announcement was… thin on details. And in the high-stakes, high-hype world of AI, that’s a recipe for instant skepticism. One AI engineer, Dan McAteer, perfectly captured the mood on X (formerly Twitter), saying Subquadratic is "either the biggest breakthrough since the Transformer … or it’s AI Theranos.”
Oof. That’s a spicy comparison.
But here’s the thing. A month later, Subquadratic is starting to show its work, and it’s getting a lot harder to just dismiss them. They’ve released results from an independent evaluation, and honestly, they’re pretty jaw-dropping. It looks like this little company might actually be onto something huge.
So, What’s the Big Promise?
Let’s get right to it. Subquadratic says they've built a totally new kind of LLM, called SubQ. They claim it’s faster, way cheaper to run, and sips energy compared to the gas-guzzling models we have today.
And that’s not all. They also say SubQ can chew through up to 12 times more text at once than most other models. Think about what that means. You could ask it to analyze hundreds of legal documents or an entire company's codebase in one go.
The kicker? They claim it does all this while performing just as well as the top-tier models from giants like Google DeepMind, OpenAI, and Anthropic on complex tasks like writing code.
It’s a bold, almost unbelievable, set of claims. And because they initially just threw out some self-published test scores without much proof, you can’t blame people for raising an eyebrow.
"We expected healthy skepticism," says Alex Whedon, the company's cofounder and CTO. He admits they probably should have led with the third-party results to quiet the doubters. Now, they’re trying to do just that.
Bringing in a Third-Party Referee
To back up their claims, Subquadratic brought in Appen, a firm that specializes in evaluating AI models. And the person who ran the tests, Appen’s director of generative AI research Jeanine Sinanan-Singh, came away seriously impressed.
"That was really exciting to me, it validated their architecture,” she said. “I was like, ‘Wow, this could be a game changer,’ because models struggle with speed and inefficiency." She gets why they needed someone else to verify it, too. "When you have kind of shocking results, it’s really not as credible when you say it yourself.”
So, what’s the secret? To understand why this is such a big deal, we need to pop the hood on how pretty much every LLM, from ChatGPT to Gemini, actually works.
The "Attention" Problem That’s Been Draining Our Wallets
The magic inside today's LLMs comes from a piece of architecture called a "transformer." You might have heard of it; the groundbreaking 2017 Google paper that kicked off this whole AI revolution was literally titled "Attention Is All You Need."
This "attention" mechanism is brilliant, but it has one massive, glaring flaw.
Think of it like this: When an LLM reads a sentence, it needs to understand the relationship between every single word. To do this, it uses something called "dense attention." It takes the first word and compares it to the second, the third, the fourth, and so on, all the way to the end. Then it takes the second word and does the same thing. Every word gets compared to every other word.
For a short sentence, that’s no big deal. But what if you want it to summarize a book?
As Subquadratic's CEO, Justin Dangel, puts it, "If you want to summarize The Great Gatsby, you have to look at the first word and the last word together, and then you have to look at every other combination."
The number of calculations doesn't just grow—it explodes. If you double the length of the text, you roughly quadruple the amount of computation needed. This is what tech folks call "quadratic scaling," and it’s the reason LLMs are so incredibly expensive and power-hungry.
Subquadratic’s "Sparse" Solution
Subquadratic’s big idea is to ditch this brute-force method. Instead of "dense attention," they use something called "sparse attention."
The concept is beautifully simple. It just says, "Hey, not all word relationships are equally important." When you’re reading a book, your brain isn't constantly comparing the first word on page one to the last word on page 300. You focus on the words that matter in a given context.
Sparse attention tries to do the same thing by intelligently picking and choosing which word pairs to compare, slashing the number of calculations needed.
Now, this idea isn't new. AI researchers have been trying to make sparse attention work for years. "Pretty much everything under the sun has been attempted,” says Will Depue, an independent AI researcher who used to work at OpenAI. He compares it to "running a four-minute mile"—a known barrier that’s incredibly hard to break.
The problem has always been that while these sparse models were faster, they were also dumber. They just couldn't capture the full meaning of a text as well as their dense attention cousins.
Until now, maybe.
Subquadratic claims their secret sauce is a dynamic selection process. Instead of using a fixed pattern (like "always compare words that are 5 words apart"), their model figures out which connections are important on the fly, and it’s different for every single piece of text it analyzes.
"That’s kind of where the secret sauce is," Whedon says.
Okay, Show Me the Numbers
So, does it work? According to Appen's tests, the answer seems to be a resounding yes.
- Speed: In a raw speed test, SubQ was a staggering 56 times faster than models using FlashAttention, a popular technique for optimizing attention.
- Coding: On a tough coding benchmark called LiveCodeBench, SubQ scored 89.7%, putting it right up there with the best coding models on the planet.
- Cost: This one is wild. Dangel says running a specific data-retrieval test cost them just eight dollars. The same test on Anthropic's top-tier model? $2,600. That’s not a typo.
- Memory: SubQ seems to have a truly massive working memory (or "context window"). It can handle up to 12 million tokens (think of tokens as pieces of words). For comparison, most leading models top out around 1 million. In a demo, the model analyzed 400 documents and gave an answer in seconds.
To test this massive memory, Appen ran the famous "needle-in-a-haystack" test, where you hide a specific fact in a mountain of text and see if the AI can find it. SubQ scored a near-perfect 98%, even when searching through documents that were 6 and 12 million tokens long. That's a scale few models are even tested at.
Is It All Too Good to Be True?
Okay, let's take a breath. The results are impressive, but it’s not time to declare the AI race over just yet.
For one, benchmarks aren't the real world. A model can ace a specific test but fall flat on its face with general, everyday tasks. And right now, very few people have actually gotten their hands on SubQ to kick the tires. There's a long waitlist, and the company is moving slowly, citing its small size.
There's also one slightly nagging detail. Instead of training their model completely from scratch, they "bootstrapped" it using the weights from an existing open-source Chinese model called Qwen. This is a common practice to save time and money, but it does slightly undercut the narrative that they've reinvented the wheel from the ground up.
As researcher Will Depue notes, "They may have built something real and useful. But the public evidence does not yet justify the stronger claim that they have solved the quadratic attention bottleneck.”
The skepticism is fair. Until SubQ is out in the wild, we only have the company's word and one set of (admittedly very positive) third-party tests.
Still, you have to admire the ambition. For a small team to even try to compete with the billion-dollar labs at Google and OpenAI, they had to do something radically different. As Whedon says, "We’re more up against it than OpenAI is."
Their big bet is that this new, efficient architecture is the future. "We hope we’re kicking off a new age of efficiency,” says CEO Justin Dangel. “We don’t think anybody will be building on transformers in a few years.”
That’s a bold prediction. But if their numbers hold up in the real world, he might just be right. We’ll be watching this one very, very closely.




