We Poisoned an AI Model's Training Data: A Hands-On Guide to Label Flipping Attacks

Akram Chauhan
Akram Chauhan
5 min read176 views
We Poisoned an AI Model's Training Data: A Hands-On Guide to Label Flipping Attacks

Have you ever followed a recipe perfectly, used the best ingredients, and still had the dish turn out… wrong? You taste it, and something is just off. Maybe someone swapped the salt for the sugar. The recipe was right, your technique was flawless, but one corrupted ingredient ruined the whole thing.

That’s pretty much what we’re going to do to an AI model today.

In the world of AI, we put a massive amount of trust in our data. We assume the millions of images, texts, or data points we feed our models are accurate. But what if they’re not? What if someone intentionally, and subtly, taints that data? This is called a data poisoning attack, and it's one of the most sneaky vulnerabilities in machine learning.

Instead of just talking about it, I want to show you exactly how it works. We're going to roll up our sleeves, write some PyTorch code, and intentionally sabotage an AI model. We’ll train two models side-by-side—one with clean data and one with slightly “poisoned” data—and see just how easily we can teach an AI to make a specific, predictable mistake.

Getting Our Lab Ready for the Experiment

First things first, we need to set up our workspace. Think of this as laying out all your tools and ingredients before you start cooking. In our case, we’ll define a few key parameters for our experiment.

CONFIG = {
    "batch_size": 128,
    "epochs": 10,
    "lr": 0.001,
    "target_class": 1,      # 'automobile' in CIFAR-10
    "malicious_label": 9,   # 'truck' in CIFAR-10
    "poison_ratio": 0.4,    # Poison 40% of automobiles
}

This little CONFIG dictionary is our recipe card. It tells us things like how many images to process at once (batch_size) and how many times to train the model (epochs).

But the juicy parts are these last three lines:

  • target_class: This is the victim. We're going to target images of automobiles.
  • malicious_label: This is the disguise. We're going to lie to our model and tell it that some of those automobiles are actually trucks.
  • poison_ratio: This is how much poison we'll use. We're going to flip the label on 40% of the automobile images in our training set.

We also set a random_seed. This isn't just for fun; it ensures that every time we run this experiment, the "random" choices (like which images get poisoned) are exactly the same. It’s all about making our results repeatable, just like in a real science experiment.

Crafting the Poison: The Malicious Dataset

Now for the fun part. How do we actually inject this poison into our dataset? We can’t just go through thousands of images and change labels by hand. Instead, we’ll create a clever little custom dataset class in PyTorch.

Think of it like building a tiny robot that works on an assembly line. Its only job is to find specific items (automobiles) and swap their labels (to "truck") before they head off to be processed.

Here’s the logic behind our PoisonedCIFAR10 class:

  1. It takes the original, clean CIFAR-10 dataset as its input.
  2. It looks for all the images that belong to our target_class (automobiles).
  3. It then randomly selects a percentage of them, based on our poison_ratio.
  4. For those selected images, it "flips" their label to the malicious_label (truck).

The most important thing here is that we only change the label, not the image itself. The picture is still clearly an automobile. We’re just lying to the model during its training, telling it, "See this car? It's a truck. Remember that."

We'll use this class to create two versions of our training data: one that’s 100% clean (by setting the poison ratio to 0) and one that’s 40% poisoned. The test data, however, remains completely untouched and pristine. That's crucial for a fair evaluation later.

Building the Brain and Teaching It to See

With our data ready, we need a model to train. We’re not going to build anything too fancy. We’ll use a lightweight version of a ResNet, a popular and reliable architecture for image classification. Think of it as a solid, dependable "image detective" that's good at its job.

Why a standard model? Because we want to isolate the variable. If the model starts acting weird, we want to be 100% sure it’s because of the poisoned data, not because of some quirky, unstable model we built.

The training process itself is pretty standard stuff:

  1. Show the model a batch of images.
  2. Let it guess what's in each image.
  3. Compare its guesses to the actual labels (or in our case, the poisoned labels).
  4. Calculate how wrong it was (the "loss").
  5. Adjust the model's internal wiring slightly to make it better at guessing next time.

We’ll run this exact same training process for both our clean model and our poisoned model. Same number of epochs, same learning rate, same everything. The only difference is the data they learn from.

The Moment of Truth: Did the Poison Work?

After training is done, it's time to see the damage. We’ll take both our models—the one raised on a healthy diet of clean data and the one fed a steady stream of lies—and show them the clean, untouched test dataset.

How do we measure the results? My favorite way is with a confusion matrix.

It sounds complicated, but it's really just a simple grid that shows you how the model is getting things confused. The rows represent the true labels, and the columns represent the model's predictions. In a perfect world, you'd see a bright diagonal line from top-left to bottom-right, meaning the model correctly identified every class.

Let's look at the results.

First, the Clean Model:

Tags

AI Machine Learning Deep Learning Computer Vision Data Science AI Engineering Python PyTorch AI Security AI Training Data Data Poisoning Attacks Label Flipping CIFAR-10 Machine Learning Vulnerabilities Adversarial Machine Learning Data Integrity Deep Learning Security AI Model Sabotage Dataset Poisoning ML Security

Stay Updated

Get the latest articles and insights delivered straight to your inbox.

We respect your privacy. Unsubscribe at any time.

Aicosoft

AI & Technology News, Insights & Innovation

AICOSOFT delivers cutting-edge AI news, technology breakthroughs, and innovation insights. Stay informed about artificial intelligence, machine learning, robotics, and the latest tech trends shaping tomorrow.

Connect With Us

© 2026 Aicosoft. All rights reserved.