Have you ever stared at a production server, wanting to run an automation script, but your hands started sweating? You’re not alone. The fear of accidentally bringing down a critical system is real, and it’s probably the biggest barrier to learning tools like Ansible.
What if you could have a complete, self-contained playground to test everything? I’m talking about a full lab with web servers, databases, roles, secret management, and complex playbooks—all running safely on your own machine, or even in a free Google Colab notebook. No cloud account needed. No SSH keys to juggle. Just pure, hands-on learning.
Well, that’s exactly what we’re going to do today. We’re going to build an end-to-end Ansible lab from the ground up. I’ll walk you through every step, explaining not just what we’re doing, but why we’re doing it. By the end, you won’t just have a bunch of code; you'll have a working, reusable lab and a solid understanding of how real-world Ansible projects are put together.
Ready? Let's get our hands dirty.
First Things First: Setting Up Our Playground
Before we can start automating, we need a place to work. We'll create a single directory for our entire lab. To make things super easy, especially if you're following along in a Colab notebook, we’ll use a few simple Python helper functions to create files and run commands for us. Think of it as our little setup crew.
First, we need to make sure ansible-core is installed. It's the engine that powers everything.
import os, sys, subprocess, textwrap, stat
# Define our main lab directory
BASE = "/content/ansible_lab" if os.path.isdir("/content") else os.path.expanduser("~/ansible_lab")
os.makedirs(BASE, exist_ok=True)
# Set up environment variables so Ansible knows where to find our config
ENV = os.environ.copy()
ENV["ANSIBLE_CONFIG"] = os.path.join(BASE, "ansible.cfg")
ENV["ANSIBLE_FORCE_COLOR"] = "1"
ENV["PY_COLORS"] = "0"
# A little helper to write files for us
def write(relpath, content):
path = os.path.join(BASE, relpath)
os.makedirs(os.path.dirname(path), exist_ok=True)
with open(path, "w") as f:
f.write(textwrap.dedent(content).lstrip("\n"))
return path
# A helper to run shell commands and print the output
def sh(cmd, title=None):
if title:
print(f"\n{'='*35} {title} {'='*35}\n")
print(f"$ {cmd}\n")
p = subprocess.run(cmd, shell=True, cwd=BASE, env=ENV, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
print(p.stdout)
return p.returncode
# Step 1: Install ansible-core
print("--- Installing ansible-core ---")
subprocess.run([sys.executable, "-m", "pip", "install", "-q", "ansible-core"], check=True)
sh("ansible --version")
With Ansible installed, the next step is to create its main configuration file, ansible.cfg. This file is like the brain of our operation. It tells Ansible where to find everything, like our list of servers (the inventory), our reusable code (roles), and our secrets.
We'll also create a simple inventory file. Think of the inventory as Ansible's address book. It lists all the servers we want to manage. For our lab, we're doing something clever: we're defining "fake" servers (web1, web2, db1) and telling Ansible to connect to them locally. This lets us practice on multiple hosts without needing any actual remote machines.
# Create our main ansible.cfg file
write("ansible.cfg", """
[defaults]
inventory = ./inventory.ini
roles_path = ./roles
library = ./library
filter_plugins = ./filter_plugins
vault_password_file = ./vault_pass.txt
host_key_checking = False
retry_files_enabled = False
interpreter_python = auto_silent
callback_result_format = yaml
deprecation_warnings = False
localhost_warning = False
nocows = 1
[privilege_escalation]
become = False
""")
# Create our static inventory file
write("inventory.ini", """
[webservers]
web1 ansible_connection=local
web2 ansible_connection=local
[dbservers]
db1 ansible_connection=local
[datacenter:children]
webservers
dbservers
""")
See that ansible_connection=local part? That's the magic trick. It tells Ansible, "Hey, when you need to talk to web1, just run the commands right here on this machine." It’s perfect for a lab environment.
Managing Configuration with Variables
Hardcoding values is a recipe for disaster. Real automation relies on variables. Ansible has a brilliant system for this using group_vars and host_vars.
Think of it like this:
group_vars/all.yml: These are settings that apply to every single server in our inventory. It's like setting a company-wide policy.host_vars/web1.yml: These are settings that only apply to theweb1server. It’s for the exceptions and unique configurations.
Let's create some variables. We'll define some app-wide settings in all.yml and a specific max_connections setting just for web1.
# Variables for ALL hosts
write("group_vars/all.yml", """
---
app_name: "Colab Demo App"
app_version: "2.0.1"
admin_email: "admin@example.com"
packages:
- nginx
- git
- htop
feature_flags:
enable_cache: true
enable_metrics: false
""")
# Variables just for the web1 host
write("host_vars/web1.yml", """
---
server_id: 101
max_connections: 512
""")
When Ansible runs, it automatically merges these. If a variable is defined in both places, the more specific one (host_vars) wins. This is called variable precedence, and it's a core concept for building flexible automation.
Teaching Ansible New Tricks: Custom Modules & Filters
Out of the box, Ansible can do a ton. But what makes it truly powerful is that you can extend it. We're going to do this in two ways: with a custom filter and a custom module.
Custom Filters: These are little Python functions that let you transform data inside your templates. We'll create two: one to turn a string into a URL-friendly "slug" and another to make file sizes human-readable (like "1.5MB" instead of "1536000").
write("filter_plugins/custom_filters.py", '''
import re
def to_slug(value):
return re.sub(r"[^a-z0-9]+", "-", str(value).lower()).strip("-")
def human_bytes(value):
n = float(value)
for unit in ["B", "KB", "MB", "GB", "TB"]:
if n < 1024:
return f"{n:.1f}{unit}"
n /= 1024
return f"{n:.1f}PB"
class FilterModule(object):
def filters(self):
return {"to_slug": to_slug, "human_bytes": human_bytes}
''')
Custom Modules: A module is a reusable script that performs an action. Ansible has hundreds of built-in modules (file, copy, service), but you can easily write your own. We'll create a simple one called system_report that gathers a few facts about the machine it's running on.
write("library/system_report.py", '''
#!/usr/bin/python
from ansible.module_utils.basic import AnsibleModule
import platform, os
def main():
module = AnsibleModule(
argument_spec=dict(
label=dict(type="str", required=True),
threshold=dict(type="int", required=False, default=80),
),
supports_check_mode=True,
)
report = {
"label": module.params["label"],
"system": platform.system(),
"release": platform.release(),
"python": platform.python_version(),
"cpu_count": os.cpu_count(),
"threshold": module.params["threshold"],
}
module.exit_json(changed=False, report=report, message="Report generated for %s" % module.params["label"])
if __name__ == "__main__":
main()
''')
Now, thanks to our ansible.cfg file pointing to the filter_plugins and library directories, Ansible will automatically find and be able to use these new tools we just built.
Building Reusable Blueprints with Roles
As your automation grows, you don't want one giant, messy playbook. You want to organize your logic into reusable components. In Ansible, we do this with Roles.
A role is just a pre-defined directory structure that bundles tasks, variables, templates, and handlers into a neat package. Let's create a webserver role.
# Role defaults (lowest priority variables)
write("roles/webserver/defaults/main.yml", """
---
listen_port: 8080
""")
# Role variables (higher priority)
write("roles/webserver/vars/main.yml", """
---
doc_root: "/tmp/www"
""")
# The main tasks for this role
write("roles/webserver/tasks/main.yml", """
---
- name: Ensure docroot exists
ansible.builtin.file:
path: "{{ doc_root }}"
state: directory
mode: "0755"
- name: Deploy index.html from a Jinja2 template
ansible.builtin.template:
src: index.html.j2
dest: "{{ doc_root }}/index.html"
notify: Restart web service
- name: Run handlers immediately (instead of end of play)
ansible.builtin.meta:
flush_handlers
""")
# Handlers are tasks that only run when "notified" by another task
write("roles/webserver/handlers/main.yml", """
---
- name: Restart web service
ansible.builtin.debug:
msg: "(simulated) restarting web service on port {{ listen_port }}"
""")
# The Jinja2 template for our index.html file
write("roles/webserver/templates/index.html.j2", """
<!DOCTYPE html>
<html>
<head><title>{{ app_name }}</title></head>
<body>
<h1>{{ app_name }} v{{ app_version }}</h1>
<p>Served on port {{ listen_port }} from {{ doc_root }}</p>
<p>Host: {{ inventory_hostname }}</p>
</body>
</html>
""")
Look at that structure! It's so clean. The tasks/main.yml is the entry point. It creates a directory and then uses the template module to generate an index.html file. Notice the notify: Restart web service line? That tells Ansible: "If this template task actually changes the file, then at the end of the playbook, run the handler named 'Restart web service'." It's a great way to avoid unnecessary service restarts.
Dynamic Inventories and Jinja2 Templates
Sometimes, your list of servers isn't static. It might live in a cloud provider, a CMDB, or just change frequently. This is where dynamic inventories come in. A dynamic inventory is just an executable script that outputs JSON in a specific format that Ansible understands.
Let's create one. We'll also make another Jinja2 template, this time for a text-based report.
# A dynamic inventory script
dyn = write("dynamic_inventory.py", '''
#!/usr/bin/env python3
import json, sys
INV = {
"webservers": {"hosts": ["web1", "web2"], "vars": {"role": "frontend"}},
"dbservers": {"hosts": ["db1"], "vars": {"role": "backend"}},
"_meta": {
"hostvars": {
"web1": {"ansible_connection": "local", "tier": "gold"},
"web2": {"ansible_connection": "local", "tier": "silver"},
"db1": {"ansible_connection": "local", "tier": "gold"},
}
},
}
if "--host" in sys.argv:
print(json.dumps({}))
else:
print(json.dumps(INV, indent=2))
''')
# Make the script executable
os.chmod(dyn, os.stat(dyn).st_mode | stat.S_IEXEC)
# A template for a deployment report
write("templates/report.txt.j2", """
Deployment Report
=================
App: {{ app_name }} ({{ app_version }})
Host: {{ inventory_hostname }}
Generated: {{ ansible_date_time.iso8601 | default('n/a') }}
Slug: {{ app_name | to_slug }}
Packages:
{% for p in packages %}
- {{ p }}
{% endfor %}
Cache enabled: {{ feature_flags.enable_cache }}
Metrics enabled: {{ feature_flags.enable_metrics }}
""")
This script dynamically defines our hosts and even injects some extra variables. In the real world, this script would be calling the AWS or Azure API to get a live list of your servers.
Tying It All Together: The Main Playbook
Okay, we’ve built all the pieces. Now it’s time to create the master script—the playbook—that orchestrates everything. This is where we’ll see all our concepts in action: variables, custom filters, loops, conditionals, error handling, and our webserver role.
write("playbook.yml", """
---
- name: Advanced concepts demo
hosts: webservers
gather_facts: true
vars:
deploy_user: colab
tasks:
- name: Merged variables (group_vars + host_vars precedence)
ansible.builtin.debug:
msg: "App={{ app_name }} v{{ app_version }} | server_id={{ server_id | default('n/a') }}"
- name: CUSTOM filter -> to_slug
ansible.builtin.debug:
msg: "slug => {{ app_name | to_slug }}"
- name: CUSTOM filter -> human_bytes
ansible.builtin.debug:
msg: "size => {{ 1536000 | human_bytes }}"
- name: LOOP with an index variable
ansible.builtin.debug:
msg: "package #{{ idx + 1 }} = {{ item }}"
loop: "{{ packages }}"
loop_control:
index_var: idx
- name: CONDITIONAL (when) — only if caching is enabled
ansible.builtin.debug:
msg: "cache is ON"
when: feature_flags.enable_cache | bool
- name: Run a command and REGISTER its output
ansible.builtin.command: date +%Y-%m-%d
register: date_out
changed_when: false
- name: SET a derived fact from the registered value
ansible.builtin.set_fact:
deploy_stamp: "{{ app_name | to_slug }}-{{ date_out.stdout }}"
- name: Show the derived fact
ansible.builtin.debug:
var: deploy_stamp
- name: Run our CUSTOM MODULE (system_report)
system_report:
label: "{{ inventory_hostname }}"
threshold: 90
register: sysrep
- name: Show custom module output
ansible.builtin.debug:
var: sysrep.report
- name: BLOCK with rescue/always (error handling)
block:
- name: This fails on purpose
ansible.builtin.command: /bin/false
rescue:
- name: Recover gracefully
ansible.builtin.debug:
msg: "caught the failure — recovering"
always:
- name: Always run cleanup
ansible.builtin.debug:
msg: "cleanup runs no matter what"
- name: Use a VAULT-encrypted secret (decrypted at runtime)
ansible.builtin.debug:
msg: "token prefix={{ api_secret_token[:3] }}*** len={{ api_secret_token | length }}"
- name: TEMPLATE a report file (tagged 'report')
ansible.builtin.template:
src: report.txt.j2
dest: "/tmp/{{ inventory_hostname }}_report.txt"
tags: [report]
- name: Role demo
hosts: web1
gather_facts: false
roles:
- role: webserver
""")
This playbook is packed with goodness. It shows how to loop over a list, run a task conditionally with when, save the output of a command into a variable with register, and even handle errors gracefully using a block/rescue structure.
Keeping Secrets Safe with Ansible Vault
You should never commit passwords, API keys, or any other secrets to version control in plain text. It's a massive security risk. Ansible's solution is Ansible Vault. It lets you encrypt sensitive data right inside your project.
Let's create a password for our vault and encrypt a secret API token.
# Step 2: Create a vault password and encrypt a secret
sh("echo 'colab-demo-vault-pass' > vault_pass.txt", "STEP 2 — Ansible Vault: creating a password file")
# Encrypt a string and get the output
enc_proc = subprocess.run(
"ansible-vault encrypt_string 'S3cr3t-Token-42' --name 'api_secret_token'",
shell=True, cwd=BASE, env=ENV, capture_output=True, text=True
)
encrypted_secret = enc_proc.stdout
# Write the encrypted secret to a group_vars file
write("group_vars/webservers.yml", f"---\n{encrypted_secret}")
print("\n`group_vars/webservers.yml` now contains the encrypted secret:\n")
with open(os.path.join(BASE, "group_vars/webservers.yml")) as f:
print(f.read())
Because we told Ansible where our vault_password_file is in ansible.cfg, it will automatically decrypt this secret at runtime whenever the api_secret_token variable is needed. It’s secure and seamless.
Let's Run This Thing!
The moment of truth. We have our lab, our inventory, our roles, and our playbook. Let's start by inspecting our setup and then running the automation.
First, let's look at our inventories.
sh("ansible-inventory -i inventory.ini --graph", "STEP 3 — Static inventory graph")
sh("ansible-inventory -i dynamic_inventory.py --list", "STEP 4 — Dynamic inventory (JSON)")
Now, let's run a few simple "ad-hoc" commands. These are great for quick checks.
sh("ansible all -m ping", "STEP 5 — Ad-hoc: ping all hosts")
sh("ansible web1 -m setup -a 'filter=ansible_python_version'", "STEP 6 — Ad-hoc: gather a single fact")
The ping module just checks if Ansible can connect and talk Python. The setup module gathers a ton of information ("facts") about a host.
Now for the main event. We'll run our playbook in a few different ways.
- Dry Run (
--check): This is a simulation. Ansible will go through all the steps but won't actually make any changes. It's the ultimate safety net. - Real Run: The real deal.
- Idempotency Run: We'll run it a second time to prove that Ansible only makes changes when necessary. This is a core principle called idempotency.
- Tagged Run (
--tags): We'll run only the task we tagged withreport.
sh("ansible-playbook playbook.yml --check --diff", "STEP 7 — Dry run (--check)")
sh("ansible-playbook playbook.yml", "STEP 8 — Real run")
sh("ansible-playbook playbook.yml", "STEP 9 — Re-run (idempotency: expect 0 changed)")
sh("ansible-playbook playbook.yml --tags report", "STEP 10 — Run only tasks tagged 'report'")
Notice in the re-run, everything should come back as ok (green) instead of changed (yellow). That's idempotency in action. The playbook knows the system is already in the desired state, so it doesn't do anything.
Finally, let's check the files our playbook created.
sh("echo '--- /tmp/www/index.html ---'; cat /tmp/www/index.html; "
"echo; echo '--- /tmp/web1_report.txt ---'; cat /tmp/web1_report.txt", "STEP 11 — Generated files")
And just to prove our Vault setup works, let's ask Ansible to show us the decrypted secret for the web1 host.
sh('ansible webservers --limit web1 -m debug -a "var=api_secret_token"', "STEP 12 — Inline vault secret decrypted")
Success! You've just built and run a sophisticated, multi-faceted Ansible lab. You've worked with static and dynamic inventories, organized logic into roles, created custom modules, managed secrets with Vault, and orchestrated it all with a powerful playbook.
This little lab is now your sandbox. Feel free to tweak the files, break things, and fix them. Change variables, add new tasks, write another role. This is the best way to build muscle memory and truly understand how to wield the power of automation with Ansible. Happy automating




