Aicosoft - AI & Technology News, Insights & Innovation

Q: If you're a data scientist or analyst, you spend a lot of your time in pandas. Applying a function to a DataFrame column can be a black box. Is it slow? Is it almost done?

tqdm integrates directly with pandas to solve this with progress_apply. print("4) pandas progress_apply (Series) + DataFrame row-wise progress (safe)")

Q: When you need to speed things up, you often turn to multithreading or multiprocessing. But how do you track progress when dozens of tasks are running at once?

tqdm has fantastic helpers for this in tqdm.contrib.concurrent. print("5) Concurrency progress: thread_map / process_map")

Q: So, What's the Big Picture?

As you can see, tqdm is so much more than a simple loop wrapper. It's a powerful tool for observability that plugs into almost any part of the modern Python stack. Whether you're dealing with parallel processing, messy logs, or asynchronous code, there’s a way to integrate a clean, informative progress bar. It makes your scripts more user-friendly, easier to debug, and gives you that crucial peace of mind knowing that things are, in fact, still running. The next time you write a script that takes more than a few seconds to run, think about where you can add a little tqdm magic. Your future self will thank you for it.

We’ve all been there. You kick off a script that’s supposed to process a mountain of data, download a huge file, or run a complex simulation. And then... you wait.

You stare at a blinking cursor, a silent terminal, with no idea what's happening. Is it working? Is it stuck? Is it 10% done or 99% done? That feeling of uncertainty is one of the most frustrating parts of running long-running code.

That’s where tqdm comes in. You’ve probably seen it used for a simple for loop, and it’s great for that. But its real power lies in its ability to bring clarity and visibility to much more complex, real-world workflows. It’s not just a progress bar; it’s a window into your code’s execution.

In this guide, we're going to go way beyond the basics. I'll show you how to use tqdm to build powerful, real-time progress tracking for the kinds of modern Python tasks you actually run every day. We'll cover everything from nested loops and file downloads to pandas, parallel processing, and even asynchronous code. Let's get started.

Setting Up Our Playground

First things first, let's get our environment ready. We’re going to be working in a way that’s safe for Google Colab or any Jupyter-style notebook. We’ll install tqdm and import all the goodies we'll need for our examples.

# Let's get the latest and greatest tqdm
pip -q install -U tqdm

import time, math, random, asyncio, hashlib, logging
import pandas as pd
import requests

# The main event
from tqdm.auto import tqdm, trange

# Helpers for more advanced stuff
from tqdm.contrib.concurrent import thread_map, process_map
from tqdm.contrib.logging import logging_redirect_tqdm

import tqdm as tqdm_pkg

print("tqdm version:", tqdm_pkg.__version__)
print("pandas version:", pd.__version__)
print("requests version:", requests.__version__)

By using tqdm.auto, we let the library automatically choose the best-looking progress bar for our environment (notebook vs. terminal). Printing the versions is just good practice to make sure everything is set up correctly before we dive in.

Handling Loops Inside of Loops (Without the Mess)

Let's start with a common scenario: nested loops. If you just wrap tqdm around both an inner and an outer loop, you end up with a mess of progress bars jumping all over your screen. It’s chaotic.

We can clean this up beautifully with two simple parameters: position and leave.

print("1) Nested progress bars (position/leave) + tqdm.write()")

# The outer loop bar will stick around
outer = trange(5, desc="Outer loop", leave=True)

for i in outer:
    # The inner bar will appear on line 2 (position=1) and disappear when done (leave=False)
    inner = trange(20, desc=f"Inner loop {i}", leave=False, position=1)
    for j in inner:
        time.sleep(0.01)
        # Use tqdm.write() instead of print() to avoid breaking the bars
        if j in (0, 10, 19):
            tqdm.write(f" note: i={i}, j={j}")

print()

Here’s the breakdown:

trange is just a shortcut for tqdm(range(...)).
leave=True on the outer loop means its progress bar will remain on the screen after it finishes.
leave=False on the inner loop tells it to disappear once it's done, keeping our output clean.
position=1 pins the inner progress bar to the second line of the output, preventing it from overwriting the outer bar.
tqdm.write() is a super useful utility. If you use a regular print() statement while tqdm is running, it will break the progress bar's formatting. tqdm.write() intelligently prints your message above the bar without messing anything up.

When You Don't Know the Total Size Upfront

Sometimes, you start a task without knowing how many items you'll have to process. Maybe you're pulling records from a database or iterating through a generator.

tqdm handles this gracefully. You can initialize a progress bar without a total and then update it manually as you discover more information.

print("2) Manual progress (unknown -> known total, update(), set_postfix())")

items = list(range(1, 101))

# Start with total=None because we don't "know" the length yet
pbar = tqdm(total=None, desc="Processing (discovering total)", unit="item")
seen = 0

for x in items:
    time.sleep(0.005)
    seen += 1

    # Imagine we discover the total after processing 25 items
    if seen == 25:
        pbar.total = len(items)
        pbar.refresh() # Important: redraw the bar with the new total

    pbar.update(1) # Manually increment the progress by 1

    # Add extra info to the bar as we go
    if x % 20 == 0:
        pbar.set_postfix(last=x, sqrt=round(math.sqrt(x), 3))

pbar.close()
print()

The magic here is starting with total=None. The progress bar will just show the count and elapsed time. Once we "discover" the total length, we set pbar.total and call pbar.refresh() to redraw it as a proper percentage-based bar.

We also used set_postfix() to add dynamic, real-time information to the end of the progress bar. This is amazing for debugging or monitoring key metrics during a run.

Tracking a Real-World File Download

This is one of my favorite uses for tqdm. Let’s see how to monitor a streaming file download using the requests library. It feels so much more professional than just letting the script hang.

print("3) Download with streaming progress")

url = "https://raw.githubusercontent.com/tqdm/tqdm/master/README.rst"
out_path = "/content/tqdm_README.rst"

with requests.get(url, stream=True, timeout=30) as r:
    r.raise_for_status()

    # Get the total file size from the headers
    total = int(r.headers.get("Content-Length", 0)) or None
    chunk = 1024 * 32 # 32KB chunks

    with open(out_path, "wb") as f, tqdm(
        total=total,
        unit="B",           # The unit is bytes
        unit_scale=True,    # Automatically convert to KB, MB, etc.
        unit_divisor=1024,  # Use 1024 for byte calculations
        desc="Downloading README",
        miniters=1,
    ) as bar:
        for part in r.iter_content(chunk_size=chunk):
            if not part:
                continue
            f.write(part)
            # Update the bar by the number of bytes we just wrote
            bar.update(len(part))

print("Saved:", out_path)
print()

This is so clean. We grab the Content-Length from the HTTP response to set our total. Then, as we iterate through the file in chunks (r.iter_content), we update the progress bar with the size of each chunk.

The unit and unit_scale parameters make it look great, automatically showing progress in KB, MB, or whatever makes the most sense.

Making Pandas Operations Visible

If you're a data scientist or analyst, you spend a lot of your time in pandas. Applying a function to a DataFrame column can be a black box. Is it slow? Is it almost done?

tqdm integrates directly with pandas to solve this with progress_apply.

print("4) pandas progress_apply (Series) + DataFrame row-wise progress (safe)")

# This one-liner enables progress_apply() and progress_map()
tqdm.pandas()

df = pd.DataFrame({
    "user_id": range(1, 2001),
    "value": [random.random() for _ in range(2000)],
})

# A dummy function that takes a bit of time
def heavy_fn(v: float) -> str:
    time.sleep(0.0005)
    s = f"{v:.10f}".encode("utf-8")
    return hashlib.sha256(s).hexdigest()[:10]

# Just use .progress_apply() instead of .apply()
df["hash"] = df["value"].progress_apply(heavy_fn)

# For row-wise operations, a simple loop is often clearest
df2 = df[["value"]].copy()
df2["hash2"] = [
    heavy_fn(float(v))
    for v in tqdm(df2["value"].to_list(), desc="Row-wise hash2", total=len(df2))
]
df["hash2"] = df2["hash2"]

print(df.head(3))
print()

All it takes is tqdm.pandas() to patch pandas with a new method. Then, instead of df['value'].apply(...), you just call df['value'].progress_apply(...), and you get a beautiful progress bar for free.

For row-wise operations, sometimes a simple list comprehension wrapped in tqdm is the most straightforward and safest approach, as shown in the second example.

Juggling Parallel Tasks with `thread_map` and `process_map`

When you need to speed things up, you often turn to multithreading or multiprocessing. But how do you track progress when dozens of tasks are running at once?

tqdm has fantastic helpers for this in tqdm.contrib.concurrent.

print("5) Concurrency progress: thread_map / process_map")

def cpuish(n: int) -> int:
    # A function that simulates some CPU work
    x = 0
    for i in range(50_000):
        x = (x + (n * i)) % 1_000_003
    return x

nums = list(range(80))

# For I/O-bound tasks, use thread_map
thread_results = thread_map(cpuish, nums, max_workers=8, desc="thread_map")
print("thread_map done:", len(thread_results))

# For CPU-bound tasks, use process_map
proc_results = process_map(cpuish, nums[:20], max_workers=2, chunksize=2, desc="process_map")
print("process_map done:", len(proc_results))
print()

This is a game-changer. thread_map and process_map are drop-in replacements for Python's built-in map function. You give them a function, an iterable, and the number of workers, and they automatically parallelize the work and display a single, coherent progress bar.

Keeping Logs and Progress Bars Separate

Here's a classic problem: you have logging set up in your application, but as soon as you run a script with a tqdm bar, the log messages spew all over the terminal and break the bar's formatting.

The fix is incredibly simple with logging_redirect_tqdm.

print("6) logging_redirect_tqdm (logs won’t break bars)")

# Basic logger setup
logger = logging.getLogger("demo")
logger.setLevel(logging.INFO)
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter("%(levelname)s: %(message)s"))
logger.handlers = [handler]

# The magic context manager
with logging_redirect_tqdm():
    for k in tqdm(range(60), desc="Work with logs"):
        time.sleep(0.01)
        if k in (5, 25, 45):
            logger.info(f"checkpoint k={k}")
print()

By wrapping our code in the with logging_redirect_tqdm(): context manager, all logging output is automatically piped through tqdm.write(). This means your logs will appear cleanly above the progress bar without disrupting it. It's so simple, yet so effective.

Taming Asynchronous Tasks

Finally, let's tackle the world of asyncio. Tracking a bunch of concurrent I/O tasks can be tricky. tqdm can handle this too, but you need to pair it with something like asyncio.as_completed.

print("7) asyncio progress (Colab/Jupyter-safe)")

async def io_task(i: int):
    # Simulate a non-blocking I/O call, like a network request
    await asyncio.sleep(random.uniform(0.02, 0.12))
    return i, random.random()

async def run_async():
    tasks = [asyncio.create_task(io_task(i)) for i in range(80)]
    results = []

    # as_completed yields futures as they finish, in any order
    for fut in tqdm(asyncio.as_completed(tasks), total=len(tasks), desc="async tasks"):
        results.append(await fut)
    return results

# In a notebook, you can just `await` the top-level async function
results = await run_async()
print("async done:", len(results), "results")

The key here is asyncio.as_completed(tasks). This function takes a list of tasks and yields them one by one as they finish. By wrapping this iterator with tqdm, we get a progress bar that updates every time one of our async tasks completes. It’s a perfect way to monitor a pool of concurrent network requests or database queries.

So, What's the Big Picture?

As you can see, tqdm is so much more than a simple loop wrapper. It's a powerful tool for observability that plugs into almost any part of the modern Python stack.

Whether you're dealing with parallel processing, messy logs, or asynchronous code, there’s a way to integrate a clean, informative progress bar. It makes your scripts more user-friendly, easier to debug, and gives you that crucial peace of mind knowing that things are, in fact, still running. The next time you write a script that takes more than a few seconds to run, think about where you can add a little tqdm magic. Your future self will thank you for it.

Beyond the Loop: Mastering Python's tqdm for Pro-Level Progress Bars

Setting Up Our Playground

Handling Loops Inside of Loops (Without the Mess)

When You Don't Know the Total Size Upfront

Tracking a Real-World File Download

Making Pandas Operations Visible

Juggling Parallel Tasks with `thread_map` and `process_map`

Keeping Logs and Progress Bars Separate

Taming Asynchronous Tasks

So, What's the Big Picture?

Tags

Source

Stay Updated

Related Articles

Stop Moving Your Data: Build In-Database Feature Pipelines with Ibis and DuckDB

Stop Writing Mock Data By Hand: A Guide to Using Polyfactory in Python

Taming Tangled Python: A Practical Guide to Measuring and Fixing Code Complexity

Beyond the Loop: Mastering Python's tqdm for Pro-Level Progress Bars

Setting Up Our Playground

Handling Loops Inside of Loops (Without the Mess)

When You Don't Know the Total Size Upfront

Tracking a Real-World File Download

Making Pandas Operations Visible

Juggling Parallel Tasks with thread_map and process_map

Keeping Logs and Progress Bars Separate

Taming Asynchronous Tasks

So, What's the Big Picture?

Tags

Source

Stay Updated

Related Articles

Stop Moving Your Data: Build In-Database Feature Pipelines with Ibis and DuckDB

Stop Writing Mock Data By Hand: A Guide to Using Polyfactory in Python

Taming Tangled Python: A Practical Guide to Measuring and Fixing Code Complexity

Cookie Settings

Juggling Parallel Tasks with `thread_map` and `process_map`