Ever been dropped into a new codebase and felt completely lost? It’s like being handed the keys to a city with no map. You know there are important buildings, hidden alleyways, and weird, forgotten districts, but you have no idea how they all connect. You end up wandering around, hoping you don’t break anything critical.
We’ve all been there. Whether it’s an open-source project you want to contribute to or a legacy system at a new job, getting your bearings is the first, and often hardest, step. What if you could have a GPS for your code? A tool that not only shows you the map but also tells you which streets are the busiest, which buildings are abandoned, and even lets you ask a local guide for directions.
That’s pretty much what we're going to build today. We're going to take a real-world Python project, itsdangerous, and use a tool called Repowise to build a deep, AI-powered understanding of it from the ground up. Think of it as turning a tangled mess of code into an interactive, intelligent workspace. Let's get started.
First Things First: Getting Repowise Fired Up
Before we can map our city, we need to get our tools ready. We’ll be working directly inside the itsdangerous repository. I’ve already got it cloned, so our first move is to set up Repowise.
The setup involves a bit of Python and some shell commands, but don't worry, I'll walk you through it. We're essentially just creating a couple of helper functions to make running commands cleaner.
import os, sys, json, subprocess, textwrap, shutil, re
from pathlib import Path
TARGET = Path("/content/itsdangerous")
assert TARGET.exists(), "Run §1–§2 first to clone the target repo."
os.chdir(TARGET)
def sh(cmd, check=False, cwd=None, timeout=None, env=None):
print(f"\n$ {cmd}")
proc = subprocess.run(
cmd, shell=True, env={**os.environ, **(env or {})},
cwd=cwd, text=True, timeout=timeout,
stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
)
if proc.stdout:
print(proc.stdout.rstrip())
print(f" ↳ exit {proc.returncode}")
if check and proc.returncode != 0:
raise RuntimeError(f"command failed (exit {proc.returncode}): {cmd}")
return proc
def banner(t):
print(f"\n{'═'*(len(t)+4)}\n {t}\n{'═'*(len(t)+4)}")
With our helpers in place, it’s time to initialize Repowise. This is the part where the magic starts. Repowise can connect to large language models (LLMs) like Claude or GPT-4 to provide some seriously cool insights. The script below cleverly checks if you have API keys for Anthropic or OpenAI set up. If you do, it uses them. If not, no big deal—it just runs in a "mock" mode, so you can still follow along with most of the features.
banner("§5 Building intelligence layers (fixed)")
sh("repowise --version")
sh("repowise init --help")
HAS_ANTHROPIC = bool(os.environ.get("ANTHROPIC_API_KEY"))
HAS_OPENAI = bool(os.environ.get("OPENAI_API_KEY"))
HAS_LLM = HAS_ANTHROPIC or HAS_OPENAI
if HAS_ANTHROPIC:
provider, model = "anthropic", "claude-3-sonnet-20240229"
elif HAS_OPENAI:
provider, model = "openai", "gpt-4o-mini"
else:
provider, model = "mock", "mock"
(TARGET / ".repowise").mkdir(exist_ok=True)
(TARGET / ".repowise" / "config.yaml").write_text(textwrap.dedent(f"""
provider: {provider}
model: {model}
embedding_model: voyage-large-2-instruct
reasoning: auto
git:
co_change_commit_limit: 200
blame_enabled: true
dead_code:
enabled: true
safe_to_delete_threshold: 0.7
maintenance:
cascade_budget: 10
""").lstrip())
print(f"Provider chosen: {provider} | LLM available: {HAS_LLM}")
init_cmd = "repowise init . --index-only" if not HAS_LLM else "repowise init ."
res = sh(init_cmd, timeout=20*60)
if res.returncode != 0:
print("\n init still failed. Things to try:")
print(" • pip install -U repowise (older versions lacked --index-only)")
print(" • set an ANTHROPIC_API_KEY and re-run without --index-only")
print(" • copy the FIRST error line above — it tells the real story")
raise SystemExit(1)
This command, repowise init ., tells the tool to scan the entire repository—every file, every function, every import—and build its initial "intelligence" layer. It’s reading the code, understanding the connections, and preparing to answer our questions.
What Did It Actually Do? A Look Under the Hood
Okay, the command finished. But what did it create? If you peek inside the new .repowise directory, you'll find a bunch of artifacts. These are the building blocks of our code intelligence.
banner("§6 .repowise/ artifact tree")
for p in sorted((TARGET / ".repowise").rglob("*")):
if p.is_file():
print(f" {str(p.relative_to(TARGET)):60s} {p.stat().st_size:>9,d} B")
You'll see files related to the code graph, embeddings, and other metadata. The most important one for us right now is the graph file. This is our map.
Seeing the Bigger Picture: What a Code Graph Tells You
This is where things get really cool. A codebase isn't just a collection of files; it's a network. Files import other files, functions call other functions, and classes inherit from one another. Repowise captures all of this as a graph.
Think of it like a social network for your code. Some files are super popular "influencers" that everyone connects to, while others are more isolated. By analyzing this graph, we can uncover the hidden structure of the project.
First, let's load the graph using a popular Python library called networkx.
banner("§7 Graph Intelligence")
import networkx as nx
G = None
for gp in (TARGET / ".repowise").rglob("*"):
if gp.is_file() and gp.suffix in {".json", ".gml", ".graphml"} and "graph" in gp.name.lower():
try:
if gp.suffix == ".json":
data = json.loads(gp.read_text())
if isinstance(data, dict) and "nodes" in data:
G = nx.node_link_graph(data)
elif gp.suffix == ".gml":
G = nx.read_gml(gp)
elif gp.suffix == ".graphml":
G = nx.read_graphml(gp)
if G is not None:
print(f"Loaded {gp.name}: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges")
break
except Exception as e:
print(f" ({gp.name}: {e})")
Now that we have our graph, we can run some powerful analyses. Let's start with PageRank—the same algorithm Google originally used to rank websites. In our case, it will tell us which files are the most "important" or "influential" in the repository.
pr = {}
if G is not None:
pr = nx.pagerank(G)
print("\nTop 10 nodes by PageRank:")
for n, s in sorted(pr.items(), key=lambda x: -x[1])[:10]:
print(f" {s:.4f} {n}")
Right away, you can see which files are the central hubs of the project. If you need to understand the core logic, these are the files to start with. Changing one of them will likely have a ripple effect across the entire application.
We can also ask the graph to find "communities"—groups of files that are more connected to each other than to the rest of the codebase. These often correspond to specific features or modules.
try:
from networkx.algorithms.community import greedy_modularity_communities
comms = list(greedy_modularity_communities(G.to_undirected()))
print(f"\n{len(comms)} communities detected; sizes:", [len(c) for c in comms[:8]])
except Exception as e:
print(f" communities skipped: {e}")
This is incredibly useful for understanding the high-level architecture without reading a single line of documentation.
Digging Deeper with Smart Tools
The graph gives us the 30,000-foot view. Now let's zoom in. Repowise comes with a set of command-line tools that feel like having a senior developer on call 24/7.
Git and Doc Intelligence
First, we can get a quick summary of the repository's state with repowise status. But the real power comes when you have an LLM connected. You can literally ask questions about the code in plain English.
Let's try a couple:
banner("§8 Git Intelligence")
sh("repowise status")
banner("§9 Doc Intelligence")
if HAS_LLM:
sh('repowise search "URL-safe token signing"')
sh('repowise query "How does Signer detect tampered payloads?"')
else:
print("(skipped — no LLM key set; provider=mock can't answer real questions)")
The search command is like a super-powered grep, and query uses the LLM's reasoning to give you a detailed explanation. It’s like searching the docs and the code at the same time, with an AI to synthesize the answer for you.
Finding the Cobwebs: Dead Code Detection
Every old city has abandoned buildings. Every long-lived codebase has dead code. It's code that's no longer used but was never removed. It just adds clutter and confusion. Repowise can help us find it.
banner("§10 Dead-code detection")
sh("repowise dead-code")
sh("repowise dead-code --safe-only")
Running this command scans for functions, classes, and variables that are never referenced. The --safe-only flag is particularly useful, as it focuses on code that's almost certainly safe to delete, helping you clean up the project with confidence.
Capturing the "Why": Architectural Decisions
Code tells you how something works, but it rarely tells you why it was built that way. We can embed these "whys" directly in the code as special comments, and Repowise will pick them up and track them.
Let's add a decision to a core file, commit it, and see what happens.
banner("§11 Architectural decisions")
src = TARGET / "src" / "itsdangerous" / "signer.py"
if src.exists() and "DECISION:" not in src.read_text():
src.write_text(
"# DECISION: Signers are stateless by design — secrets are passed at\n"
"# construction so signing can be parallelised safely.\n"
+ src.read_text()
)
sh('git -c user.email=demo@x -c user.name=demo commit -am "demo: inline decision"')
sh("repowise update .")
sh("repowise decision list")
sh("repowise decision health")
Now, anyone (or any AI) can instantly see a list of key architectural choices and check if they are still valid. This is a game-changer for long-term maintenance.
Bringing It All Together: AI-Powered Context and Queries
Repowise can package all this intelligence into a single file, CLAUDE.md, designed to give an LLM all the context it needs to help you with development.
banner("§12 CLAUDE.md")
sh("repowise generate-claude-md")
md = TARGET / "CLAUDE.md"
if md.exists():
print(md.read_text()[:4000])
This file is a goldmine. You can copy-paste it into a chatbot like Claude or ChatGPT before asking a question about the codebase, and you'll get dramatically better, more context-aware answers.
Let's wrap up by running a few more of the built-in "MCP" (Master Control Program) tools, which are basically pre-canned prompts for common developer questions.
banner("§13 MCP tools via CLI")
base = [
("get_dead_code", "repowise dead-code --safe-only"),
("search_codebase", 'repowise search "timestamp expiry validation"'),
]
llm_only = [
("get_overview", 'repowise query "Architecture overview please"'),
("get_context", 'repowise query "Explain signer and serializer modules"'),
("get_risk", 'repowise query "What is risky about changing signer.py?"'),
("get_why", 'repowise query "Why are signers stateless?"'),
("get_dependency_path", 'repowise query "How does URLSafeSerializer reach Signer?"'),
("get_architecture_diagram", 'repowise query "Produce a Mermaid diagram of the package"'),
]
for name, cmd in base + (llm_only if HAS_LLM else []):
print(f"\n──── {name} ────")
sh(cmd)
Look at those questions! We're asking about risk, intent, and dependency paths. These are the kinds of deep questions that usually take hours of code-diving to answer, and we're getting them in seconds.
The Map Comes to Life: Visualizing the Codebase
Finally, let's visualize that graph we made earlier. A picture is worth a thousand lines of code, right? We'll plot the top 40 most "important" files (according to PageRank) and see how they connect.
banner("§14 Graph plot")
if G is not None:
import matplotlib.pyplot as plt
top = [n for n, _ in sorted(pr.items(), key=lambda x: -x[1])[:40]]
H = G.subgraph(top).copy()
sizes = [4000 * pr[n] / max(pr.values()) + 80 for n in H.nodes]
plt.figure(figsize=(12, 8))
pos = nx.spring_layout(H, seed=7, k=0.9)
nx.draw_networkx_edges(H, pos, alpha=0.25, arrows=False)
nx.draw_networkx_nodes(H, pos, node_size=sizes, node_color="#F59520", alpha=0.85)
nx.draw_networkx_labels(H, pos, labels={n: Path(n).name if isinstance(n, str) else n for n in H.nodes}, font_size=8)
plt.title("itsdangerous — top-40 nodes by PageRank")
plt.axis("off"); plt.tight_layout(); plt.show()
print("\n done.")
There it is. Your city map. The bigger nodes are the major landmarks—the most influential files. The lines show you the roads connecting them. Instantly, you can see the structure, the hubs, and the outliers. This is the kind of insight that can make onboarding a new team member, planning a major refactor, or just fixing a tricky bug so much easier.
We've gone from a simple directory of files to a rich, queryable, and visual representation of a codebase. We didn't just look at the code; we understood its shape, its history, and its hidden logic. Tools like this are fundamentally changing how we interact with software, turning overwhelming complexity into manageable, and even beautiful, clarity.




