If you've ever tried to fine-tune a truly massive AI model, you know the pain. It's not just about writing the code; it's about wrestling with a monster of distributed infrastructure. You're managing GPU clusters, dealing with failures, and trying to schedule everything just right. It can feel like you spend more time being a systems administrator than an AI engineer.
Well, I’ve got some really interesting news for you on that front. The team at Thinking Machines Lab just pulled back the curtain on their Tinker training API, moving it into general availability. No more waitlist.
And they didn't just open the doors—they rolled out a red carpet with some seriously powerful new features, including support for a beast of a reasoning model and the ability to work with images. This could genuinely change how you approach fine-tuning.
So, What's Tinker, Anyway?
Before we get into the shiny new stuff, let's quickly cover what Tinker actually is.
Think of it like this: You have a brilliant recipe for a complex dish (your training logic), but you don't want to build and manage a massive, industrial-scale kitchen (the GPU cluster). Tinker is the service that lets you, the chef, focus purely on the recipe.
You write a simple Python loop on a regular CPU-only machine. In that loop, you define your data, your loss function, and the training logic. That's it. You hand that simple script over to the Tinker service, and it magically maps your code onto a huge cluster of GPUs and runs it for you. It handles all the messy, heavy lifting of distributed training behind the scenes.
The API itself is clean and simple. You get a few core commands like forward_backward to get your gradients, optim_step to tweak the model's weights, and sample to see what your model is thinking. This is perfect for anyone wanting to do supervised learning, reinforcement learning, or preference optimization without the infrastructure headache.
One more thing to know: Tinker uses Low-Rank Adaptation, or LoRA, for all its models. Instead of retraining the entire multi-billion parameter model from scratch, LoRA freezes the base model and just trains a few small "adapter" layers on top. This is way more efficient, saving a ton of memory and making it practical to run lots of experiments quickly.
The Big News: General Availability and a 1 Trillion Parameter Brain
Okay, the headline change is that you can now just sign up and start using Tinker. The waitlist is gone. This is huge because it opens up some serious firepower to a much wider audience.
And speaking of firepower, they’ve added a new model to the lineup that really caught my eye: Kimi K2 Thinking from Moonshot AI.
This thing is an absolute monster. It’s a Mixture-of-Experts (MoE) model with around 1 trillion total parameters. For context, that puts it in the same league as some of the most powerful models on the planet. Kimi K2 is specifically designed for long, complex chains of thought and heavy tool use, making it a "reasoning model." It literally thinks internally before giving you an answer, which is a different approach from instruction-following models that prioritize speed and direct responses.
It's now the largest model available on Tinker, sitting alongside other great open models like Qwen3, Llama-3, and DeepSeek-V3.1.
Making Your Life Easier: OpenAI-Style Sampling
Here’s a quality-of-life update that developers are going to love. While you're in the middle of a training run, you obviously want to check in on your model and see how it's learning. Tinker always had a way to do this, but now they've added a second method that's instantly familiar.
You can now sample from a training checkpoint using an interface that mirrors the OpenAI completions API.
You get a special URI for your model checkpoint that looks something like tinker://..., and you can plug that directly into any standard OpenAI client. It’s a small change, but it means you can use all the existing tools and scripts you already have for OpenAI models to test your in-training Tinker models. Super convenient.
It's Not Just Words Anymore: Tinker Gets Eyes with Qwen3-VL
This might be the most exciting update of all. Tinker now officially supports image inputs. This is a big step into the world of multimodal AI.
They’ve done this by integrating two powerful vision-language models:
- Qwen/Qwen3-VL-30B-A3B-Instruct
- Qwen/Qwen3-VL-235B-A22B-Instruct
Now, you can build training pipelines that mix images and text. The API makes it surprisingly straightforward. You just create an input that interleaves text with an ImageChunk, which contains the raw image data (like a PNG or JPEG).
The cool part is that you use this same structure for both supervised fine-tuning and RLHF-style tuning. This keeps your multimodal code consistent and clean. And yes, this all works perfectly with Tinker's LoRA setup.
Putting Vision to the Test
Of course, the team didn't just add the feature; they showed what it can do. They ran a fascinating experiment fine-tuning the bigger Qwen3-VL model (the 235B parameter one) to act as an image classifier.
They tested it on four standard datasets: Caltech 101, Stanford Cars, and the Oxford Flowers and Pets datasets. Since Qwen3-VL is a language model at its core, they framed the task as text generation: show the model an image and have it generate the correct class name as text.
For a baseline, they compared it against a fine-tuned DINOv2 model, which is a popular vision transformer often used for these kinds of tasks. The real focus of the experiment was data efficiency—how well do these models perform when you only give them a handful of examples?
The results were pretty clear. The massive Qwen3-VL model, fine-tuned on Tinker, was significantly better at few-shot learning. It achieved higher accuracy than the specialized DINOv2 baseline, especially when working with just one or a few labeled examples per class. This really highlights the power of these large vision-language models; they can learn new visual concepts with an incredible amount of data efficiency.
So, What's the Bottom Line?
Let's wrap this up. Thinking Machines Lab just made some serious moves with Tinker.
First, it's open for business for everyone. You can sign up today and start fine-tuning some of the best open-weight models available without building your own supercomputer.
Second, you now have access to Kimi K2 Thinking, a true titan of a model built for complex reasoning.
Third, they’ve added vision capabilities with the Qwen3-VL models, letting you build sophisticated multimodal AI pipelines.
And finally, they’ve made the whole process a bit smoother with OpenAI-compatible sampling.
For AI engineers and researchers, this is fantastic news. It lowers the barrier to entry for working with frontier models and gives you a practical, efficient toolkit for pushing the boundaries of what’s possible. It’s definitely something to keep an eye on.




