Have you ever thought about how much better your bank could be at spotting fraud if it could learn from the experiences of every other bank? Scammers use the same tricks everywhere, so if one bank sees a new type of attack, it would be amazing if every other bank could instantly learn to defend against it.
But there’s a huge, glaring problem: privacy. Banks can't just pool all their customer transaction data together. That would be a security nightmare and, frankly, illegal. So they're all stuck in their own little silos, learning from only a tiny piece of the puzzle.
What if there was a way to get the benefit of shared knowledge without ever sharing the raw data?
Well, there is. It’s called Federated Learning, and it’s one of the most fascinating and practical ideas in AI right now. Instead of bringing the data to the model, we bring the model to the data. It’s a complete flip of the script. In this walkthrough, I’m going to show you how we can build a simple simulation of this from scratch—no heavy, complicated frameworks needed. We'll create a mini-universe with ten banks, a central server, and even use OpenAI to translate our technical results into a report your boss would actually understand.
Let's get started.
First Things First: Setting Up Our Virtual Lab
Before we can do anything cool, we need to get our digital workspace ready. Think of this as laying out all our tools on the workbench. We're going to be using some popular Python libraries like PyTorch for the machine learning part, Scikit-learn for handling data, and OpenAI's library for the reporting at the end.
We're also going to do something really important right at the start: set a "random seed." Why? Because machine learning involves a lot of randomness. By setting a seed, we're making sure that if you run this code, you'll get the exact same "random" results I do. It makes our experiment repeatable and predictable, which is crucial when you're trying to figure out if something is working.
Here’s the quick setup code that imports everything and locks in our seed. We're also telling PyTorch to just use the CPU, which keeps things simple and means you can run this on pretty much any computer.
# Check out the Full Codes here.
pip -q install torch scikit-learn numpy openai
import time, random, json, os, getpass
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_auc_score, average_precision_score, accuracy_score
from openai import OpenAI
SEED = 7
random.seed(SEED); np.random.seed(SEED); torch.manual_seed(SEED)
DEVICE = torch.device("cpu")
print("Device:", DEVICE)
Creating Data That Looks Like the Real World
Okay, lab's set up. Now we need something to work with. We can't use real bank data, of course, so we're going to create our own synthetic dataset that looks and feels like real credit card transactions.
The most important thing about fraud data is that it’s highly imbalanced. Think about it: the vast majority of transactions are legitimate. Fraud is rare. Maybe only 1% or 2% of transactions are actually bad. If we created a dataset that was 50% fraud and 50% not, our model wouldn't learn anything useful for the real world.
So, we'll use a handy function from Scikit-learn to generate 60,000 transactions with 30 features (think of these as things like transaction amount, time of day, location, etc.). We'll specifically tell it that only about 1.5% of these should be fraudulent.
Then, we split this data into a training set (what our virtual banks will learn from) and a test set (what we'll use to see how well our final model performs).
# Check out the Full Codes here.
X, y = make_classification(
n_samples=60000,
n_features=30,
n_informative=18,
n_redundant=8,
weights=[0.985, 0.015], # This creates the imbalance!
class_sep=1.5,
flip_y=0.01,
random_state=SEED
)
X = X.astype(np.float32)
y = y.astype(np.int64)
# Split into a main training pool and a final test set
X_train_full, X_test, y_train_full, y_test = train_test_split(
X, y, test_size=0.2, stratify=y, random_state=SEED
)
# Standardize the data (a common ML practice)
server_scaler = StandardScaler()
X_train_full_s = server_scaler.fit_transform(X_train_full).astype(np.float32)
X_test_s = server_scaler.transform(X_test).astype(np.float32)
# Create a 'loader' for our test data for easy evaluation
test_loader = DataLoader(
TensorDataset(torch.from_numpy(X_test_s), torch.from_numpy(y_test)),
batch_size=1024,
shuffle=False
)
Simulating Our Ten Banks
This is where the "federated" part really begins. We have a big pool of training data, but we can't just use it all at once. We need to pretend it's siloed across ten different banks.
And we can't just split it evenly. In reality, each bank has a different mix of customers. One bank might have more high-income customers, another more students. This means their data will be different—what we call "non-IID" (not independent and identically distributed). Simulating this is key to making our experiment realistic.
We'll use something called a Dirichlet distribution to split our data across the ten clients. It's a bit mathy, but the result is simple: each "bank" gets its own unique slice of the data, with slightly different fraud rates and patterns. This makes the learning challenge much harder, and much more like the real world.
# Check out the Full Codes here.
def dirichlet_partition(y, n_clients=10, alpha=0.35):
# This function splits data indices into non-IID chunks
classes = np.unique(y)
idx_by_class = [np.where(y == c)[0] for c in classes]
client_idxs = [[] for _ in range(n_clients)]
for idxs in idx_by_class:
np.random.shuffle(idxs)
props = np.random.dirichlet(alpha * np.ones(n_clients))
cuts = (np.cumsum(props) * len(idxs)).astype(int)
prev = 0
for cid, cut in enumerate(cuts):
client_idxs[cid].extend(idxs[prev:cut].tolist())
prev = cut
return [np.array(ci, dtype=np.int64) for ci in client_idxs]
NUM_CLIENTS = 10
client_idxs = dirichlet_partition(y_train_full, NUM_CLIENTS, 0.35)
# Now, create the actual data splits for each client
def make_client_split(X, y, idxs):
Xi, yi = X[idxs], y[idxs]
# A little fix to ensure each client has at least a few fraud/non-fraud examples
if len(np.unique(yi)) < 2:
other = np.where(y == (1 - yi[0]))[0]
add = np.random.choice(other, size=min(10, len(other)), replace=False)
Xi = np.concatenate([Xi, X[add]])
yi = np.concatenate([yi, y[add]])
return train_test_split(Xi, yi, test_size=0.15, stratify=yi, random_state=SEED)
client_data = [make_client_split(X_train_full, y_train_full, client_idxs[c]) for c in range(NUM_CLIENTS)]
# Finally, create data loaders for each client
def make_client_loaders(Xtr, ytr, Xva, yva):
sc = StandardScaler()
Xtr_s = sc.fit_transform(Xtr).astype(np.float32)
Xva_s = sc.transform(Xva).astype(np.float32)
tr = DataLoader(TensorDataset(torch.from_numpy(Xtr_s), torch.from_numpy(ytr)), batch_size=512, shuffle=True)
va = DataLoader(TensorDataset(torch.from_numpy(Xva_s), torch.from_numpy(yva)), batch_size=512)
return tr, va
client_loaders = [make_client_loaders(*cd) for cd in client_data]
The "Brain" of the Operation: Our Fraud Detection Model
Every AI system needs a model. We'll build a small but effective neural network using PyTorch. It doesn't need to be huge or complicated; just a few layers that are good at finding patterns. We'll call it FraudNet.
We also need a few helper functions. Think of these as our tools:
- A way to
get_weightsfrom a model. - A way to
set_weightson a model. - A way to
evaluatehow well a model is doing. - A function to
train_localon a bank's private data.
This is the standard toolkit for almost any machine learning project.
# Check out the Full Codes here.
class FraudNet(nn.Module):
def __init__(self, in_dim):
super().__init__()
self.net = nn.Sequential(
nn.Linear(in_dim, 64), nn.ReLU(), nn.Dropout(0.1),
nn.Linear(64, 32), nn.ReLU(), nn.Dropout(0.1),
nn.Linear(32, 1)
)
def forward(self, x):
return self.net(x).squeeze(-1)
def get_weights(model):
return [p.detach().cpu().numpy() for p in model.state_dict().values()]
def set_weights(model, weights):
keys = list(model.state_dict().keys())
model.load_state_dict({k: torch.tensor(w) for k, w in zip(keys, weights)}, strict=True)
@torch.no_grad()
def evaluate(model, loader):
# ... (evaluation logic) ...
model.eval()
bce = nn.BCEWithLogitsLoss()
ys, ps, losses = [], [], []
for xb, yb in loader:
logits = model(xb)
losses.append(bce(logits, yb.float()).item())
ys.append(yb.numpy())
ps.append(torch.sigmoid(logits).numpy())
y_true = np.concatenate(ys)
y_prob = np.concatenate(ps)
return {
"loss": float(np.mean(losses)),
"auc": roc_auc_score(y_true, y_prob),
"ap": average_precision_score(y_true, y_prob),
"acc": accuracy_score(y_true, (y_prob >= 0.5).astype(int))
}
def train_local(model, loader, lr):
# ... (local training logic) ...
opt = torch.optim.Adam(model.parameters(), lr=lr)
bce = nn.BCEWithLogitsLoss()
model.train()
for xb, yb in loader:
opt.zero_grad()
loss = bce(model(xb), yb.float())
loss.backward()
opt.step()
The Main Event: The Federated Learning Loop
Alright, all the pieces are in place. Now for the magic. Here’s how the federated learning process works, round by round. It's like a dance between the central server and the ten banks.
- Broadcast: The central server starts with a generic, untrained model. It sends a copy of this model's parameters (its "weights") to each of the ten banks.
- Local Training: Each bank takes the model and trains it only on its own private data for a short time. Because their data is different, each bank's model will learn slightly different things.
- Update: After training, each bank sends its updated model weights back to the server. Crucially, they only send the weights—the learned patterns—not a single piece of customer data.
- Aggregate: The server now has ten slightly different, slightly smarter models. It aggregates them together using a simple but powerful algorithm called Federated Averaging (FedAvg). It basically creates a new, improved global model by averaging the insights from all ten banks, weighting them by how much data each bank has.
- Repeat: The server then sends this newly improved model back out to the banks, and the whole process starts over.
With each round, the global model gets smarter, incorporating the diverse experiences of all the participating banks without ever seeing their private data.
# Check out the Full Codes here.
def fedavg(weights, sizes):
total = sum(sizes)
return [
sum(w[i] * (s / total) for w, s in zip(weights, sizes))
for i in range(len(weights[0]))
]
ROUNDS = 10
LR = 5e-4
global_model = FraudNet(X_train_full.shape[1])
global_weights = get_weights(global_model)
for r in range(1, ROUNDS + 1):
client_weights, client_sizes = [], []
# This loop simulates the clients
for cid in range(NUM_CLIENTS):
# 1. Client receives the global model
local = FraudNet(X_train_full.shape[1])
set_weights(local, global_weights)
# 2. Client trains on its own data
train_local(local, client_loaders[cid][0], LR)
# 3. Client prepares its update
client_weights.append(get_weights(local))
client_sizes.append(len(client_loaders[cid][0].dataset))
# 4. Server aggregates the updates
global_weights = fedavg(client_weights, client_sizes)
set_weights(global_model, global_weights)
# Evaluate the new global model
metrics = evaluate(global_model, test_loader)
print(f"Round {r}: {metrics}")
As you run this, you'll see the metrics (like auc and ap, which are good for imbalanced data) improve with each round. That's collaborative learning in action!
From Numbers to Narrative: Let's Get OpenAI to Write Our Report
So, we have our final metrics. They look pretty good! But if you walk into a meeting with your fraud risk team and just show them a dictionary of numbers, their eyes will glaze over. We need to translate our findings into a business-friendly report.
This is a perfect job for a large language model. We can feed our final metrics, along with some context about the simulation (like the number of clients and their data sizes), into a model like GPT and ask it to write an executive summary.
We'll securely prompt for an API key and then send off our request.
# Check out the Full Codes here.
OPENAI_API_KEY = getpass.getpass("Enter OPENAI_API_KEY (input hidden): ").strip()
if OPENAI_API_KEY:
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
client = OpenAI()
# Bundle up our results into a dictionary
summary = {
"rounds": ROUNDS,
"num_clients": NUM_CLIENTS,
"final_metrics": metrics,
"client_sizes": [len(client_loaders[c][0].dataset) for c in range(NUM_CLIENTS)],
"client_fraud_rates": [float(client_data[c][1].mean()) for c in range(NUM_CLIENTS)]
}
# Create a clear prompt for the AI
prompt = (
"Write a concise internal fraud-risk report.\n"
"Include executive summary, metric interpretation, risks, and next steps.\n\n"
+ json.dumps(summary, indent=2)
)
# Get the AI-generated report
resp = client.chat.completions.create(model="gpt-4-turbo", messages=[{"role": "user", "content": prompt}])
print(resp.choices[0].message.content)
Just like that, you get a clean, readable report that explains the performance, highlights potential risks (like the impact of data imbalance), and suggests next steps. It bridges the gap between the code and the decision-makers.
So, What Did We Actually Accomplish?
Let's take a step back. We just built a working, end-to-end simulation of a privacy-preserving machine learning system. We saw firsthand how ten independent "banks" could work together to build a fraud detection model that's far better than what any single one could build alone.
And the most beautiful part? No raw, sensitive customer data ever had to be shared.
This simple, lightweight approach shows that you don't need a massive, complex infrastructure to start experimenting with these powerful ideas. By focusing on the core principles, we've created a practical blueprint for exploring federated learning. It's a powerful reminder that the future of AI doesn't have to be a choice between intelligence and privacy. We can, and should, have both.




