How to Build Your Own AI Data Analyst Team with Google's ADK

Akram Chauhan
Akram Chauhan
14 min read104 views
How to Build Your Own AI Data Analyst Team with Google's ADK

Ever feel like you need a whole team of data analysts to get through your work? You've got a mountain of data, and you know there are golden nuggets of insight hidden in there, but the process is just… a grind. First, you load the data. Then you clean it. Then you run some stats, make a few charts, and maybe, just maybe, you find something interesting. It’s a lot of manual back-and-forth.

But what if you could actually build that team? Not with people, but with AI.

Imagine having a specialist for every step: a data loader, a sharp-eyed statistician, a creative visualizer, and a meticulous reporter. You’d act as the project lead, giving high-level instructions, and your AI team would handle the grunt work, collaborating to turn your raw data into a polished report.

That’s exactly what we’re going to do today. Using Google's Agent Development Kit (ADK), we're going to build a multi-agent pipeline in Python. It sounds complex, but I promise, the concept is surprisingly simple and incredibly powerful. Let’s build your new data team.

Setting Up Our Workshop: The Foundation

Before we hire our AI specialists, we need to set up their workspace. This involves three key things: installing the right tools, handling our secret keys securely, and creating a shared "whiteboard" where they can all access the data.

First, let's get all the necessary Python libraries installed. This is like stocking our workshop with everything from hammers and nails (Pandas, NumPy) to fancy power tools (Google ADK, Matplotlib).

# Copy Code
# Copied
# Use a different Browser!
pip install google-adk -q
!pip install litellm -q
!pip install pandas numpy scipy matplotlib seaborn -q
!pip install openpyxl -q
print(" All packages installed!")

import os
import io
import json
import getpass
import asyncio
from datetime import datetime
from typing import Optional, Dict, Any, List

import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns

from google.adk.agents import Agent
from google.adk.models.lite_llm import LiteLlm
from google.adk.sessions import InMemorySessionService
from google.adk.runners import Runner
from google.adk.tools.tool_context import ToolContext
from google.genai import types

import warnings
warnings.filterwarnings("ignore")
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")
print(" Libraries loaded!")

def make_serializable(obj):
    if isinstance(obj, dict):
        return {k: make_serializable(v) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [make_serializable(item) for item in obj]
    elif isinstance(obj, (np.integer, np.int64, np.int32)):
        return int(obj)
    elif isinstance(obj, (np.floating, np.float64, np.float32)):
        return float(obj)
    elif isinstance(obj, np.ndarray):
        return obj.tolist()
    elif isinstance(obj, (np.bool_,)):
        return bool(obj)
    elif isinstance(obj, pd.Timestamp):
        return obj.isoformat()
    elif pd.isna(obj):
        return None
    else:
        return obj
print(" Serialization helper ready!")

Next, we need to give our agents a brain. We'll be using an OpenAI model here, which means we need an API key. The code below tries to load it securely from Google Colab secrets, but if that fails, it'll prompt you to enter it.

# Copy Code
# Copied
# Use a different Browser!
print("=" * 60)
print(" API KEY CONFIGURATION")
print("=" * 60)
try:
    from google.colab import userdata
    api_key = userdata.get('OPENAI_API_KEY')
    print(" API key loaded from Colab Secrets!")
except:
    print("\n Enter your OpenAI API key (hidden input):")
    api_key = getpass.getpass("OpenAI API Key: ")

os.environ['OPENAI_API_KEY'] = api_key
if api_key and len(api_key) > 20:
    print(f" API Key configured: {api_key[:8]}...{api_key[-4:]}")
else:
    print(" Invalid API key!")

MODEL = "openai/gpt-4o-mini"
print(f" Using model: {MODEL}")

Finally, and this is crucial, we create a centralized DataStore. Think of this as the team's shared drive or whiteboard. When one agent loads a dataset, it puts it here. When another agent needs to analyze it, it grabs it from here. This prevents confusion and ensures everyone is working with the same information.

# Copy Code
# Copied
# Use a different Browser!
class DataStore:
    _instance = None
    def __new__(cls):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
            cls._instance.datasets = {}
            cls._instance.analysis_history = []
        return cls._instance

    def add_dataset(self, name: str, df: pd.DataFrame, source: str = "unknown"):
        self.datasets[name] = {
            "data": df,
            "loaded_at": datetime.now().isoformat(),
            "source": source,
            "shape": (int(df.shape[0]), int(df.shape[1])),
            "columns": list(df.columns)
        }
        return f"Dataset '{name}' stored: {df.shape[0]} rows × {df.shape[1]} columns"

    def get_dataset(self, name: str) -> Optional[pd.DataFrame]:
        if name in self.datasets:
            return self.datasets[name]["data"]
        return None

    def list_datasets(self) -> List[str]:
        return list(self.datasets.keys())

    def log_analysis(self, analysis_type: str, dataset: str, result_summary: str):
        self.analysis_history.append({
            "timestamp": datetime.now().isoformat(),
            "type": analysis_type,
            "dataset": dataset,
            "summary": result_summary
        })

DATA_STORE = DataStore()
print(" DataStore initialized!")

With our workshop set up, it's time to hire our first specialist.

Specialist #1: The Data Loader

Every project starts with getting the data. Our Data Loader agent has one job: bring data into our DataStore. We'll give it two skills: loading a CSV file from a path and generating synthetic sample datasets (like sales or customer data) for quick testing.

# Copy Code
# Copied
# Use a different Browser!
def load_csv(file_path: str, dataset_name: str, tool_context: ToolContext) -> dict:
    print(f" Loading CSV: {file_path} as '{dataset_name}'")
    try:
        df = pd.read_csv(file_path)
        result = DATA_STORE.add_dataset(dataset_name, df, source=file_path)
        # Update session state
        datasets = tool_context.state.get("loaded_datasets", [])
        if dataset_name not in datasets:
            datasets.append(dataset_name)
        tool_context.state["loaded_datasets"] = datasets
        tool_context.state["active_dataset"] = dataset_name
        summary = {
            "status": "success",
            "message": result,
            "preview": {
                "columns": list(df.columns),
                "shape": [int(df.shape[0]), int(df.shape[1])],
                "dtypes": {k: str(v) for k, v in df.dtypes.items()},
                "sample": make_serializable(df.head(3).to_dict(orient="records"))
            }
        }
        return make_serializable(summary)
    except Exception as e:
        return {"status": "error", "message": f"Failed to load CSV: {str(e)}"}

def create_sample_dataset(dataset_type: str, dataset_name: str, tool_context: ToolContext) -> dict:
    print(f" Creating sample dataset: {dataset_type} as '{dataset_name}'")
    np.random.seed(42)
    # ... [code for generating different sample datasets: sales, customers, etc.]
    # (The full code is in the original article for brevity)
    if dataset_type == "sales":
        # ... sales data generation
    elif dataset_type == "customers":
        # ... customers data generation
    # ... etc. for timeseries and survey
    else:
        return {"status": "error", "message": f"Unknown dataset type: {dataset_type}. Use: sales, customers, timeseries, survey"}
    
    result = DATA_STORE.add_dataset(dataset_name, df, source=f"sample_{dataset_type}")
    datasets = tool_context.state.get("loaded_datasets", [])
    if dataset_name not in datasets:
        datasets.append(dataset_name)
    tool_context.state["loaded_datasets"] = datasets
    tool_context.state["active_dataset"] = dataset_name
    return make_serializable({
        "status": "success",
        "message": result,
        "description": f"Created sample {dataset_type} dataset",
        "columns": list(df.columns),
        "shape": [int(df.shape[0]), int(df.shape[1])],
        "sample": df.head(3).to_dict(orient="records")
    })

def list_available_datasets(tool_context: ToolContext) -> dict:
    print(" Listing datasets")
    datasets = DATA_STORE.list_datasets()
    if not datasets:
        return {"status": "info", "message": "No datasets loaded. Use create_sample_dataset or load_csv."}
    info = {}
    for name in datasets:
        ds = DATA_STORE.datasets[name]
        info[name] = {
            "rows": int(ds["shape"][0]),
            "columns": int(ds["shape"][1]),
            "column_names": ds["columns"]
        }
    return make_serializable({
        "status": "success",
        "datasets": info,
        "active_dataset": tool_context.state.get("active_dataset")
    })

print(" Data loading tools defined!")

Simple, right? Now we have an agent whose whole world is just getting data ready for the rest of the team.

Specialist #2: The Statistician

Once the data is in, we need to understand it. That's where our Statistician agent comes in. This agent is the numbers geek. It can give us a full descriptive summary, check for correlations between variables, run formal hypothesis tests (like t-tests or ANOVA), and even sniff out outliers.

These are the fundamental skills for any data exploration.

# Copy Code
# Copied
# Use a different Browser!
def describe_dataset(dataset_name: str, tool_context: ToolContext) -> dict:
    # ... [code for detailed dataset description]
    print(f" Describing dataset: {dataset_name}")
    df = DATA_STORE.get_dataset(dataset_name)
    # ... implementation details ...
    return make_serializable(result)

def correlation_analysis(dataset_name: str, method: str = "pearson", tool_context: ToolContext = None) -> dict:
    # ... [code for correlation matrix and strong correlations]
    print(f" Correlation analysis: {dataset_name} ({method})")
    df = DATA_STORE.get_dataset(dataset_name)
    # ... implementation details ...
    return make_serializable(result)

def hypothesis_test(dataset_name: str, test_type: str, column1: str, column2: str = None, group_column: str = None, tool_context: ToolContext = None) -> dict:
    # ... [code for Shapiro-Wilk, T-Test, ANOVA, Chi-Square]
    print(f" Hypothesis test: {test_type} on {dataset_name}")
    df = DATA_STORE.get_dataset(dataset_name)
    # ... implementation details for each test type ...
    return make_serializable(result)

def outlier_detection(dataset_name: str, column: str, method: str = "iqr", tool_context: ToolContext = None) -> dict:
    # ... [code for IQR and Z-Score outlier detection]
    print(f" Outlier detection: {column} in {dataset_name}")
    df = DATA_STORE.get_dataset(dataset_name)
    # ... implementation details ...
    return make_serializable(result)

print(" Statistical analysis tools defined!")

Our team is getting smarter. We can now load data and immediately get a deep statistical understanding of it. But numbers on a screen can be dry. Let's bring in an artist.

Specialist #3: The Visualizer

A picture is worth a thousand numbers. Our Visualizer agent is responsible for turning data into insightful charts. We'll equip it with the ability to create all the classics: histograms, scatter plots, bar charts, line charts, heatmaps, and more.

We'll also give it a special power tool: a create_distribution_report function that generates a 4-in-1 plot to give a complete view of a single variable's distribution.

# Copy Code
# Copied
# Use a different Browser!
def create_visualization(dataset_name: str, chart_type: str, x_column: str, y_column: str = None, color_column: str = None, title: str = None, tool_context: ToolContext = None) -> dict:
    # ... [code for creating various chart types: histogram, scatter, bar, etc.]
    print(f" Creating {chart_type}: {x_column}" + (f" vs {y_column}" if y_column else ""))
    df = DATA_STORE.get_dataset(dataset_name)
    # ... plotting logic for each chart type ...
    plt.show()
    plt.close()
    return make_serializable(result)

def create_distribution_report(dataset_name: str, column: str, tool_context: ToolContext = None) -> dict:
    # ... [code for the 4-in-1 distribution plot (histogram, box plot, Q-Q, violin)]
    print(f" Distribution report: {column} in {dataset_name}")
    df = DATA_STORE.get_dataset(dataset_name)
    # ... plotting logic ...
    plt.show()
    plt.close()
    return make_serializable(result)

print(" Visualization tools defined!")

Now we’re talking! We can load data, analyze it, and see the patterns visually.

Specialist #4 & #5: The Transformer and The Reporter

Our team is almost complete. We just need two more roles.

First, the Transformer. Sometimes the data isn't in the right shape. We need to filter it, group it, or add new calculated columns. This agent handles all data manipulation tasks.

Second, the Reporter. After all the analysis is done, we need someone to summarize the key findings. This agent can generate a clean, high-level summary report of a dataset and can also pull up the history of all the analyses we've performed.

# Copy Code
# Copied
# Use a different Browser!
def filter_data(dataset_name: str, condition: str, new_dataset_name: str, tool_context: ToolContext) -> dict:
    # ... [code to filter a dataframe]
    print(f" Filtering {dataset_name}: {condition}")
    # ... implementation ...
    return make_serializable(result)

def aggregate_data(dataset_name: str, group_by: str, aggregations: str, new_dataset_name: str, tool_context: ToolContext) -> dict:
    # ... [code to group and aggregate data]
    print(f" Aggregating {dataset_name} by {group_by}")
    # ... implementation ...
    return make_serializable(result)

def add_calculated_column(dataset_name: str, new_column: str, expression: str, tool_context: ToolContext) -> dict:
    # ... [code to add a new column based on an expression]
    print(f" Adding column '{new_column}' to {dataset_name}")
    # ... implementation ...
    return make_serializable(result)

print(" Transformation tools defined!")

def generate_summary_report(dataset_name: str, tool_context: ToolContext) -> dict:
    # ... [code to generate a comprehensive text report]
    print(f" Generating report: {dataset_name}")
    # ... implementation ...
    return make_serializable(result)

def get_analysis_history(tool_context: ToolContext) -> dict:
    # ... [code to retrieve analysis log from DataStore]
    return make_serializable(result)

print(" Reporting tools defined!")

The team is assembled. We have a loader, a statistician, a visualizer, a transformer, and a reporter. Now, we need a manager to orchestrate their work.

The Master Analyst: Bringing the Team Together

This is where the magic happens. We create each of our specialists as a distinct Agent, giving each one its name, its description, and the specific tools (the Python functions we just wrote) it's allowed to use.

Then, we create the master_analyst. This agent doesn't have any tools of its own. Instead, its "team" is the list of specialist agents we just defined. When we give a task to the master analyst, its job is to understand the request and delegate it to the right specialist.

# Copy Code
# Copied
# Use a different Browser!
data_loader_agent = Agent(
    name="data_loader",
    model=LiteLlm(model=MODEL),
    description="Loads CSV files, creates sample datasets (sales, customers, timeseries, survey)",
    tools=[load_csv, create_sample_dataset, list_available_datasets]
)

stats_agent = Agent(
    name="statistician",
    model=LiteLlm(model=MODEL),
    description="Statistical analysis: descriptive stats, correlations, hypothesis tests, outliers",
    tools=[describe_dataset, correlation_analysis, hypothesis_test, outlier_detection]
)

viz_agent = Agent(
    name="visualizer",
    model=LiteLlm(model=MODEL),
    description="Creates charts: histogram, scatter, bar, line, box, heatmap, pie",
    tools=[create_visualization, create_distribution_report]
)

transform_agent = Agent(
    name="transformer",
    model=LiteLlm(model=MODEL),
    description="Data transformation: filter, aggregate, calculate columns",
    tools=[filter_data, aggregate_data, add_calculated_column]
)

report_agent = Agent(
    name="reporter",
    model=LiteLlm(model=MODEL),
    description="Generates summary reports and tracks analysis history",
    tools=[generate_summary_report, get_analysis_history]
)

print(" Specialist agents created!")

master_analyst = Agent(
    name="data_analyst",
    model=LiteLlm(model=MODEL),
    description="Master Data Analyst orchestrating end-to-end data analysis",
    instruction="""You are an expert Data Analyst with a team of specialists. YOUR TEAM:
    1. data_loader - Load/create datasets
    2. statistician - Statistical analysis
    3. visualizer - Charts and plots
    4. transformer - Data transformations
    5. reporter - Reports and summaries
    WORKFLOW: 1. Load data → 2. Describe → 3. Visualize → 4. Analyze → 5. Transform if needed → 6. Report
    Be helpful, explain insights clearly, suggest next steps.""",
    sub_agents=[data_loader_agent, stats_agent, viz_agent, transform_agent, report_agent]
)

print(f" Master Analyst ready with {len(master_analyst.sub_agents)} specialists!")

Putting Your AI Team to Work

With the whole system built, all that's left is to run it. The code below sets up a session and an analyze function that lets us chat with our Master Analyst.

Watch what happens when we give it a series of simple, natural language commands.

# Copy Code
# Copied
# Use a different Browser!
session_service = InMemorySessionService()
# ... [session and runner setup] ...

async def analyze(query: str):
    print(f"\n{'='*70}\n You: {query}\n{'='*70}")
    content = types.Content(role='user', parts=[types.Part(text=query)])
    response = ""
    try:
        async for event in runner.run_async(user_id=USER_ID, session_id=SESSION_ID, new_message=content):
            if event.is_final_response() and event.content and event.content.parts:
                response = event.content.parts[0].text
                break
    except Exception as e:
        response = f"Error: {str(e)}"
    print(f"\n Analyst: {response}\n{'='*70}\n")

print(" Ready! Use: await analyze('your question')")

# --- DEMO ---
await analyze("Create a sales dataset for analysis and name it sales_data")
await analyze("Describe the sales_data dataset for me")
await analyze("Show me a bar chart of total revenue by region")
await analyze("Is there a significant difference in revenue between the different customer types? Run an ANOVA test.")
await analyze("Finally, generate a summary report for sales_data")

When you run this, you'll see the system spring to life.

  • "Create a sales dataset..." -> The Master Analyst delegates to the data_loader.
  • "Describe the dataset..." -> It calls on the statistician.
  • "Show me a bar chart..." -> It tasks the visualizer.
  • "Run an ANOVA test..." -> The statistician gets called again.
  • "Generate a summary..." -> The reporter wraps things up.

You're no longer just writing code; you're directing a team. You're having a conversation with your data, and your AI team is providing the answers, complete with stats, charts, and summaries.

This is more than just a cool tech demo. It’s a glimpse into a new way of working with data—one that’s more intuitive, more efficient, and frankly, a lot more fun. By breaking down a complex process into specialized roles, we’ve built an automated assistant that can handle a full analysis workflow from start to finish. Go ahead, give it a try and see what insights your new team can uncover.

Tags

AI Data Science AI Engineering Python Software Development Data Analysis Data Visualization AI agents AI Workflow Automation Autonomous Agents AI Automation AI Development Tutorial Multi-Agent Systems AI Pipeline Google ADK Report Generation Statistical Testing

Stay Updated

Get the latest articles and insights delivered straight to your inbox.

We respect your privacy. Unsubscribe at any time.

Aicosoft

AI & Technology News, Insights & Innovation

AICOSOFT delivers cutting-edge AI news, technology breakthroughs, and innovation insights. Stay informed about artificial intelligence, machine learning, robotics, and the latest tech trends shaping tomorrow.

Connect With Us

© 2026 Aicosoft. All rights reserved.