DuckDB connector

Land business data in DuckDB and run analytical SQL on a laptop, a server or a Lambda.

Data Panda lifts data from your CRM, ERP, ecommerce, product and finance systems and writes it to DuckDB files or to MotherDuck. One columnar engine, embedded in whatever process needs it, answers Parquet, CSV and JSON queries without a separate database server in the loop.

Data Panda Reporting Automation AI Apps
DuckDB logo
About DuckDB

The in-process columnar database for analytical SQL anywhere your code runs.

DuckDB started in 2018 at the Database Architectures group at CWI in Amsterdam, the same research lab where MonetDB came from, with Mark Raasveldt and Hannes Mühleisen as the original authors. The first public release shipped in 2019 and version 1.0 (codename Snow Duck) landed on 3 June 2024 with a stable on-disk format that future versions read back. The MIT-licensed code is held by the DuckDB Foundation and built by DuckDB Labs in Amsterdam; MotherDuck ships the managed cloud version and is one of the gold sponsors backing the project.

The engine is in-process and columnar. There is no server to run, no port to open and no replication to manage; DuckDB is a library that lives inside Python, R, Node, Java, Rust, the CLI, your data app or a serverless function. Inside that process it stores tables column by column, runs a vectorized executor on batches of rows and reads Parquet, CSV and JSON files directly from the local disk, an HTTPS URL or an S3 bucket with predicate and projection pushdown. The same binary scans a 200 GB Parquet dataset on a laptop and powers an analytical endpoint in a Lambda, and the same SQL works against a local file or a MotherDuck warehouse without rewriting the query.

What your DuckDB data is for

What you get once DuckDB is connected.

Notebook and BI on the same DuckDB file

Analysts open the same .duckdb file (or attach to MotherDuck) and run analytical SQL without spinning up a warehouse.

  • Python and R notebooks query DuckDB directly without an export step
  • BI tools read the same tables via the DuckDB JDBC or ODBC driver
  • Local prototyping moves to MotherDuck without rewriting the SQL

Pipeline staging on Parquet and S3

DuckDB lands operational data in Parquet, then reads it back from S3 with predicate pushdown for the next pipeline step.

  • Read Parquet, CSV and JSON straight from S3 or HTTPS via the httpfs extension
  • Hive-partitioned datasets pruned at query time
  • The same engine writes the cleaned tables back to Parquet for the warehouse

Vector search next to your facts

DuckDB stores embeddings alongside the operational tables, so retrieval and aggregation hit one in-process engine.

  • VSS extension gives HNSW vector indexes inside the same database
  • Embeddings join with customer or product facts in one SQL statement
  • Result sets feed Claude or OpenAI prompts without a separate vector store

Analytical endpoints in serverless functions

Lambda, Cloud Run and Workers ship DuckDB inside the function and answer queries without a database round trip.

  • Cold-start friendly because DuckDB is a single binary with no server
  • Per-tenant Parquet files queried with the right filters at request time
  • DuckDB-Wasm runs the same engine in the browser for client-side analytics
Use cases

Use cases we deliver with DuckDB data.

A list of concrete reports, automations and AI features we have built on DuckDB data. Pick the one that matches your situation.

Notebook analyticsPython or R notebooks running analytical SQL directly on DuckDB tables and Parquet files.
Parquet on S3Read and write Parquet datasets on S3 with predicate and projection pushdown.
Pipeline stagingStage cleaned tables in DuckDB or Parquet before they land in the warehouse.
Local-first prototypingBuild the model on a laptop, ship the same SQL to MotherDuck or a serverless function.
Embedded analyticsDuckDB inside an internal app or customer portal answering per-tenant queries.
Serverless analytical APILambda, Cloud Run or Workers serving SQL queries with DuckDB inside the function.
DuckDB-Wasm in browserClient-side analytics with the same engine compiled to WebAssembly.
MotherDuck warehouseThe managed DuckDB cloud for shared tables, scheduled jobs and bigger compute.
Vector search (VSS)HNSW indexes inside DuckDB so embeddings live next to the facts.
ELT modelling layerdbt-duckdb or SQLMesh transformations on local files or MotherDuck.
Cost arithmeticLocal DuckDB or MotherDuck picked on actual workload, not on the brochure.
Real business questions

Answers you will finally get.

Is DuckDB an alternative to Snowflake or BigQuery?

Not directly. DuckDB is an in-process analytical engine, not a hosted multi-tenant warehouse. It is the right tool for notebook work, pipeline staging, embedded analytics and per-tenant queries on Parquet datasets that fit on one machine or scale up via MotherDuck. Snowflake and BigQuery remain better for governed enterprise reporting and concurrent business users hitting the same warehouse. Many stacks now run both: DuckDB for local modelling and serverless endpoints, the warehouse for the company-wide reporting layer.

Do we need MotherDuck or can we just use DuckDB?

DuckDB on its own covers laptop analytics, pipeline staging, serverless endpoints and any workload where one process has the data in front of it. MotherDuck adds shared storage, multi-user access, scheduled SQL jobs and bigger compute, with the same DuckDB SQL dialect on top. We default to plain DuckDB for embedded and pipeline use and reach for MotherDuck when the team needs shared tables and a managed runtime.

How big can DuckDB go before we need a real warehouse?

DuckDB streams data through its vectorized executor and spills to disk when needed, so it handles datasets that do not fit in memory and routinely scans hundreds of gigabytes of Parquet on a single machine. The point where you outgrow it is rarely raw size; it is concurrent users, shared writes and governance. When ten people need to write to the same table at the same time, that is when DuckDB stops being the right tool and a warehouse (or MotherDuck) takes over.

Value for everyone in the organisation

Where each function gets value.

For finance leaders

Finance teams get reproducible SQL on a single .duckdb file that anyone on the team can open. Month-end reconciliations, margin analyses and ad-hoc CFO questions run on the same Parquet exports the warehouse already produces, without paying warehouse compute for every iteration.

For sales leaders

Sales sees account-level usage, pipeline trends and product engagement on the same Parquet exports the data team already produces. Account reviews stop waiting on warehouse refreshes because a DuckDB query on the latest export answers per-account questions in seconds.

For operations

Ops and platform teams ship DuckDB inside Lambda, Cloud Run or a containerised job and get analytical SQL without standing up a warehouse for every internal tool. Per-tenant Parquet on S3 is queried at request time with predicate pushdown, and the same code runs locally for debugging.

Your existing tools

Your data lands in a warehouse. Your BI tools read from it.

You keep the reporting tool you already have. We connect it to the warehouse where your DuckDB data lives.

Power BI logo
Power BI Microsoft
Microsoft Fabric logo
Fabric Microsoft
Snowflake logo
Snowflake Data warehouse
Google BigQuery logo
BigQuery Google
Tableau logo
Tableau Visualisation
Microsoft Excel logo
Excel Sheets & pivots
Three steps

From DuckDB to answers in three steps.

01

Connect securely

OAuth authentication. Read-only by default. We sign a DPA and your admin keeps the keys.

02

Land in your warehouse

Data flows into your warehouse on your schedule. Near real time or nightly, your call. You own the data.

03

Reporting, automation, AI

We build the first dashboard, workflow or AI feature with you, then hand over the keys. Or we stay on for ongoing delivery.

Two ways to work with us

Pick the track that fits how you work.

Track 01

Self-serve

We set up the foundation. Your team builds on top.

  • DuckDB connector configured and running
  • Warehouse set up in your cloud account
  • Clean access for your Power BI, Fabric or Tableau team
  • Documentation on what's in the data model
  • Sync monitoring so you're warned before reports break

Best fit Teams that already have a BI analyst or data engineer and want to own the build.

Track 02

Done for you

We build the whole thing, end to end.

  • Everything in Self-serve
  • Dashboards built to the questions your team actually asks
  • Automations between your systems
  • AI workflows scoped to real tasks your team runs
  • Custom apps where a dashboard does not cut it
  • Ongoing delivery at a pace that fits your team

Best fit Teams without in-house BI or dev capacity. You tell us what you need and we deliver it.

Before you book

Frequently asked questions.

Who owns the data?

You do. It lands in your warehouse, on your cloud account. We don't resell or aggregate it. If you stop working with us, the warehouse stays yours and keeps running.

How fresh is the data?

Near real time for most operational systems. For heavier sources we schedule hourly or nightly. You pick based on what the reports need.

Do I need a warehouse already?

No. If you don't have one, we help you pick one and set it up as part of the first delivery. Common starting points are Snowflake, Microsoft Fabric, or a small Postgres start.

When should we pick DuckDB over a warehouse like Snowflake or BigQuery?

DuckDB wins when one process already has the data in front of it: a notebook, a serverless function, a pipeline step, an embedded app. It is in-process and columnar, with no server to run, so it removes a network hop and a class of operational work. Snowflake and BigQuery remain better when many users need to write and query the same governed warehouse at the same time. Plenty of stacks now run both, with DuckDB for local modelling, staging and per-tenant endpoints, and the warehouse as the company-wide reporting layer.

What does MotherDuck add on top of DuckDB?

MotherDuck is the managed cloud version of DuckDB, founded by Jordan Tigani and team and based in Seattle. It adds shared storage, multi-user access, scheduled SQL jobs and bigger compute behind the same DuckDB SQL dialect. The local-to-cloud bridge is the point: a notebook running plain DuckDB can ATTACH to a MotherDuck database and join local Parquet to cloud tables in one query.

Can DuckDB read Parquet straight from S3 without copying it locally?

Yes. The httpfs extension lets DuckDB read Parquet, CSV and JSON over HTTPS and the S3 API, with predicate and projection pushdown so only the needed row groups and columns leave the bucket. Hive-partitioned layouts (year=2024/month=04/...) are pruned at query time. The same query works against a local file, a public HTTPS URL or a private S3 prefix with the right credentials configured.

GDPR-compliant
Data stays in the EU
You own the warehouse

A first deliverable live in four to six weeks.

We review your DuckDB setup and the systems around it. Together we pick the first thing worth building.