The quickest and cleanest possible way
Many teams today want to let non-technical users query data using plain English instead of writing SQL or dashboards. I have been playing around with multiple frameworks and techniques to achieve this, and finally I landed upon open ai agenst sdk + motherduck data warehouse, this is the quickest, cleanest, effective and at the same time thoroughly customizable way I could find.
Combining MotherDuck’s data-warehousing power with agents that can generate and run SQL makes that possible. In this post I walk you through a minimal end-to-end example — how to wire up a simple agent that queries a database on MotherDuck.
Why use MotherDuck + OpenAI Agents SDK
Before diving into code, a quick look at why this combo makes sense:
MotherDuck’s strengths
- MotherDuck gives you per-user (or per-tenant) isolation: each user (or customer) gets their own dedicated DuckDB instance (“duckling”), so analytics workloads don’t interfere with each other. That’s great for multi-tenant setups or customer-facing analytics.
- Because DuckDB is lightweight and embeddable, you get really low latency. You don’t need a heavyweight MPP cluster. That makes queries fast enough for interactive use.
- MotherDuck supports “AI + database” workflows: agents connected to MotherDuck can generate SQL, run queries, and fetch results — bridging LLMs and structured data elegantly.
- It’s free tier supports personal use, good for POC and experimentations
OpenAI Agents SDK’s strengths
- The SDK provides a minimal set of primitives (agents, tools, sessions, guardrails) that make it easy to build agentic apps.
- It lets you wrap any Python function as a “tool”: that means you can write a function to execute SQL (against MotherDuck), expose it as a tool, and let the agent call it dynamically.
- Because it’s Python-first and lightweight, you can prototype quickly without massive boilerplate.
Put together, this architecture lets you build natural-language analytics agents that query real data stored in a scalable, isolated, low-latency warehouse.
High-level Flow
- Get access (account + token) to MotherDuck and have a database loaded (or use the sample DB tables for experimentation like I will demonstrate in the sample code).
- In Python, wrap a function that connects to MotherDuck (via DuckDB) and runs arbitrary SQL.
- Use OpenAI Agents SDK to define an agent, expose your SQL runner as a tool.
- Send natural language questions to the agent. The agent generates SQL, executes via the tool, returns results.
Below is the sample code illustrating this flow.
# sample_agent.py
from agents import Agent, Runner, function_tool
import duckdb
# Tool: runs SQL against MotherDuck
@function_tool
def run_sql(query: str) -> str:
"""
Execute SQL query against MotherDuck and return results as string.
"""
token = os.getenv("MOTHERDUCK_TOKEN")
conn = duckdb.connect(f"md:my_database?motherduck_token={token}", read_only=True)
try:
df = conn.execute(query).fetchdf()
return df.to_string(index=False)
except Exception as e:
return f"Error executing query: {e}"
# Define the agent
agent = Agent(
name="AnalyticsAgent",
instructions=(
"You are a data assistant. "
"When user asks a question about data, generate valid DuckDB SQL, "
"execute via run_sql tool, and return the result."
),
tools=[run_sql],
# optionally configure model / temperature here
)
def ask(question: str):
result = Runner.run_sync(agent, question)
return result.final_output
if __name__ == "__main__":
user_query = input("Ask something about data -> ")
print(ask(user_query))
run_sql: a Python function decorated as a “tool” so the agent can call it whenever it needs to run SQL. It connects to your MotherDuck database (via DuckDB), executes the SQL, returns results.Agent(...): creates an agent with a system prompt (instructions) telling it what its role is. We give it access to therun_sqltool.Runner.run_sync(agent, ...): sends a user question, triggering the agent loop: the agent uses the LLM to generate SQL + reasoning, then callsrun_sql, grabs results, returns them.
With this minimal setup, you already get a natural-language → structured-data interface.
For a full working code with a little enhancements such as adding a schema definition, query guidance which makes the agent response time shorter, crisper and cleaner, checkout the repo here:
[GitHub]]( https://github.com/dibyendutapadar/simple-data-analytics-motherduck-openai-agents )