It wasn’t that long ago—seriously, like two years—that running a powerful AI model meant you needed a connection to a massive, power-hungry data center somewhere. The idea of having that kind of intelligence running completely offline, right on your phone, felt like pure science fiction.
Well, the future is arriving a lot faster than we thought.
The team at Liquid AI just released something that really caught my eye: a model called LFM2.5-1.2B-Thinking. And the headline feature is a doozy. It’s a 1.2 billion parameter model that’s laser-focused on reasoning, and it clocks in at just under 1 GB. You can literally run this thing on a modern smartphone, no internet required.
This isn't just about shrinking tech for the sake of it. It’s a sign of a massive shift in AI. We're moving from giant, one-size-fits-all chatbots to smaller, specialized models that are designed to do one thing exceptionally well. And this little model's specialty is something incredibly important: thinking.
So, What Is This Thing, Exactly?
Let's get one thing straight right away: LFM2.5-1.2B-Thinking isn't your next creative writing partner. It’s not built to write poetry or brainstorm marketing copy. Instead, think of it as a meticulous planner or a brilliant detective's assistant.
Its whole purpose is structured reasoning. It excels at tasks that require logic, like working through math problems, following multi-step instructions, and figuring out how to use digital "tools" (like APIs).
Here are the quick specs for my fellow nerds:
- Size: 1.17 billion parameters (they call it a 1.2B class model)
- Context Window: A very respectable 32,768 tokens, so it can handle a lot of information at once.
- Training Data: It was trained on a massive 28 trillion tokens.
- Languages: It’s multilingual, supporting English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.
But the real magic isn’t in the numbers. It’s in how it gets to an answer.
The "Thinking" That Changes Everything
The most fascinating part of this model is right there in its name: "Thinking." When you give it a complex problem, it doesn't just spit out a final answer. First, it generates an internal monologue—a series of "thinking traces."
Imagine you ask it to plan a trip. Instead of just giving you an itinerary, it might first think:
- Okay, the user wants a 3-day trip to Paris.
- First, I need to find flights. I'll use the flight search tool.
- Next, I need to find a hotel near the Eiffel Tower. I'll use the hotel booking tool.
- Then, I'll check for museum tickets. I'll use the ticket booking tool.
- Finally, I’ll assemble all this into a coherent plan.
You get to see its chain of thought. This is a huge deal. For developers building AI agents or complex data-extraction pipelines, this transparency is gold. You can see why the AI is doing what it's doing, verify its logic, and debug it if something goes wrong. It’s the difference between a black box and a glass box.
This makes it the perfect "planning brain" inside a larger system. You use LFM2.5-1.2B-Thinking to figure out the plan, and then maybe call a different, larger model for broad world knowledge if you need it.
Punching Way Above Its Weight Class
So, it's small and it's transparent. But is it any good?
You’d think a tiny 1.2B parameter model would get stomped by the competition, but you'd be wrong. The Liquid AI team put it head-to-head with other models in its size class, and the results are impressive.
Compared to its own sibling, the "Instruct" version, this new "Thinking" model shows massive gains where it counts:
- Math Reasoning: Jumped from a score of 63 to 88 on the MATH 500 benchmark. That's a huge leap.
- Instruction Following: Improved from 61 to 69 on Multi IF.
- Tool Use: Climbed from 49 to 57 on BFCLv3.
What’s even more wild is that it competes directly with Qwen3-1.7B, a larger model, on most reasoning tasks. It holds its own while using about 40% fewer parameters. That’s an incredible efficiency win. It also outperforms other popular small models like versions of Granite, Gemma-3, and Llama-3.2 on many of these specific reasoning tasks.
How They Solved AI's Annoying "Doom Loop" Problem
If you've played around with reasoning models, you've probably seen this happen. The AI starts thinking, gets stuck, and begins repeating the same thought over and over again. It’s called "doom looping," and it's incredibly frustrating.
The Liquid AI team knew this was a deal-breaker for on-device use, so they tackled it head-on with a clever, multi-stage training process.
- Start with the Right Pattern: Early in its training, they fed it examples that included reasoning traces. This taught the model the fundamental pattern: "first, you think, then you answer."
- Refine the Thoughts: Next, they fine-tuned it on tons of high-quality, synthetically generated chains of thought to make its reasoning sharper.
- Learn from Mistakes: Then came preference alignment. For a given prompt, they'd generate several possible answers. An AI judge would then pick the best one and, just as importantly, explicitly label any outputs that got stuck in a loop as "bad."
- Penalize Repetition: Finally, during the last stage of reinforcement learning, they added a simple but effective penalty for repeating n-grams (short sequences of words).
The result? The doom loop rate plunged from a cringey 15.74% down to just 0.36%. That’s the difference between an annoying, unreliable tool and a smooth, dependable one.
So, How Fast Is It on a Real Device?
This is where the rubber meets the road. All these features are great, but if it runs like molasses on your phone, who cares?
Fortunately, performance was a key goal. The model is optimized to run efficiently on the CPUs and NPUs (Neural Processing Units) found in today's consumer hardware. On an AMD CPU, it can generate text at around 239 tokens per second. On a mobile NPU, like the ones in Qualcomm Snapdragon chips, it hits a very usable 82 tokens per second.
And it does all of this while staying under that 1 GB memory footprint. This means you can build apps with powerful, offline reasoning capabilities that won't kill your battery or hog all your RAM. The possibilities for genuinely smart, private, on-device assistants are pretty exciting.
Ready to Try It Out Yourself?
The best part is that this isn't some locked-away research project. You can get your hands on it right now.
If you just want to play around with it or access it via an API, you can find it on services like:
- OpenRouter
- Liquid AI Playground
- LEAP (Liquid’s Edge AI Platform)
And if you're a developer who wants to build with it locally or on your own servers, the model is available on Hugging Face. It comes in all the popular formats you'd expect—GGUF (for llama.cpp), ONNX, and MLX—making it super easy to drop into your projects.
This little model feels like a big step forward. It’s not about chasing the biggest parameter count. It’s about building smart, efficient, and transparent tools that can run anywhere. And by bringing this level of reasoning right onto our personal devices, Liquid AI is opening the door to a whole new class of intelligent applications. I, for one, can't wait to see what people build with it.




