Apache Superset connector

Point Apache Superset at one curated warehouse, not at every operational database your team can reach.

Data Panda lands the data from your CRM, ERP, ecommerce and finance tools in a warehouse and feeds Superset from one curated layer. SQL Lab, virtual datasets and the chart library all read the same definitions, and the self-hosted instance stops being a museum of one-off connections.

About Apache Superset

Open-source BI for teams that want a chart library on top of SQL they already write.

Apache Superset started as a hackathon project at Airbnb by Maxime Beauchemin in 2015 under the name Caravel, was donated to the Apache Software Foundation as an incubator project in 2017, and graduated to top-level Apache project status in 2021. It is licensed under Apache 2.0, runs self-hosted on your own infrastructure, and connects to 70+ SQL databases, including Snowflake, BigQuery, Redshift, Postgres, MySQL, ClickHouse and DuckDB. The commercial flavour is Preset, the company Beauchemin founded, which offers a managed Cloud edition and contributes the majority of commits back to the project.

The appeal in BE/NL is straightforward: a chart library, a SQL Lab editor, dashboards and a virtual-dataset model, all without a Tableau or Power BI per-seat bill. Data-engineering-heavy teams reach for it because the dataset model is plain SQL and YAML, not LookML or DAX, and because every chart can also be authored as a saved query. The trap, in mid-market deployments we see, is the same as with every BI tool: pointed at three operational databases instead of one curated warehouse, the SQL Lab queries multiply, virtual datasets pile up, and the dashboards start contradicting each other. We feed Superset from one warehouse so the dataset model stays small, the cache hits land where they should, and the upgrade story stays manageable.

What your Apache Superset data is for

What you get once Apache Superset is connected.

Datasets that agree with each other

Superset datasets and charts read from a curated warehouse model with one definition per metric.

Revenue, churn and active customers defined once and reused across dashboards
Virtual datasets sit on warehouse views, not on raw production tables
Charts and SQL Lab queries answer with the same numbers as the close pack

Off the live operational databases

Superset reads the warehouse on a known cadence instead of querying live ERP and CRM directly.

Operational databases stop carrying analyst load at month-end
Cached charts refresh from one warehouse, not from ten source connections
Long SQL Lab queries do not block a transaction in the live ERP

Natural-language questions on a curated layer

Superset's chart explorer and Preset's AI features answer from warehouse views, not raw tables.

Natural-language queries hit one definition of customer and revenue
Forecasts run on warehouse history, not a six-week SQL Lab cache
Generated SQL stays inside the curated dataset, not on random columns

Embedded analytics in your product

Superset embedded dashboards show customer-facing views on the same warehouse data your team uses.

Embedded charts backed by row-level security in the warehouse
One dataset model for internal Superset and customer-facing portals
Tenant filtering applied in the warehouse view, not in every chart

Use cases

Use cases we deliver with Apache Superset data.

A list of concrete reports, automations and AI features we have built on Apache Superset data. Pick the one that matches your situation.

SQL Lab on a clean warehouseAnalysts write SQL Lab queries against curated marts instead of raw production schemas.

Virtual-dataset cleanupReconcile dozens of overlapping virtual datasets into one shared warehouse view per metric.

Off the OLTPMove Superset off the live production database onto a warehouse replica.

Finance reconciliationRevenue and AR in Superset that ties back to the boekhouding.

Sales and CS overviewPipeline, usage and renewal risk on one Superset dashboard.

Embedded customer dashboardsCustomer-facing Superset embeds on warehouse-curated tenant data.

Cohort retentionProduct usage cohorts joined to billing and CRM data.

Marketing attributionAd spend, signups and revenue on one Superset dashboard.

Self-hosted to Preset CloudMove from self-hosted Superset to Preset Cloud on the same warehouse.

Dashboard cleanupArchive stale dashboards and saved queries based on actual usage.

Real business questions

Answers you will finally get.

How many of our Superset virtual datasets calculate revenue differently?

Run the dataset list against one shared warehouse view and the duplicates surface fast. A self-hosted Superset older than two years usually carries three or four versions of revenue: one virtual dataset on the CRM extract, one on the billing tool, one on a SQL Lab snippet that someone saved as a chart. Reconciling them to a single warehouse view turns the next steering committee into a discussion about the business, not about whose dashboard is right.

Why is our Superset still on a version from a year ago?

Self-hosted Superset upgrades require Python, Celery, Redis and database-migration discipline that nobody on the team owns end to end. Most BE/NL self-hosted instances we see are one or two minor versions behind the Apache release line. Either you commit to an upgrade cadence on the warehouse-fed setup, or you move to Preset Cloud and let them carry the upgrade and patch lag. The dataset layer stays the same on either path.

Should we move from self-hosted Superset to Preset Cloud?

Preset Cloud takes the upgrade, scaling and Celery-worker babysitting off your plate, which is the most common reason self-hosted instances stall. Both paths work on a warehouse-fed setup, because the dataset layer is decoupled from where Superset itself runs. The decision becomes about where you want to spend operations time, not about your data.

Value for everyone in the organisation

Where each function gets value.

For finance leaders

Finance gets a Superset dashboard pack where revenue, AR and margin tie back to the boekhouding instead of inventing themselves per virtual dataset. Month-end stops being three people comparing charts, and the CFO can subscribe to a recap that already reconciles.

For sales leaders

Sales sees pipeline, usage and renewal risk in one Superset dashboard, with the same customer master Finance and CS read. Forecast meetings stop being a debate about whose SQL Lab query is the right one, and the same numbers travel to the QBR pack.

For operations

Operations and product leads get warehouse-backed dashboards on the open-source BI the data team already runs. The production database stops being the place analysts run heavy SQL on, and virtual datasets stop multiplying because the curated layer covers the common asks.

Your existing tools

Your data lands in a warehouse. Your BI tools read from it.

You keep the reporting tool you already have. We connect it to the warehouse where your Apache Superset data lives.

Power BI Microsoft

Fabric Microsoft

Snowflake Data warehouse

BigQuery Google

Tableau Visualisation

Excel Sheets & pivots

Three steps

From Apache Superset to answers in three steps.

Connect securely

OAuth authentication. Read-only by default. We sign a DPA and your admin keeps the keys.

Land in your warehouse

Data flows into your warehouse on your schedule. Near real time or nightly, your call. You own the data.

Reporting, automation, AI

We build the first dashboard, workflow or AI feature with you, then hand over the keys. Or we stay on for ongoing delivery.

Two ways to work with us

Pick the track that fits how you work.

Track 01

Self-serve

We set up the foundation. Your team builds on top.

Apache Superset connector configured and running
Warehouse set up in your cloud account
Clean access for your Power BI, Fabric or Tableau team
Documentation on what's in the data model
Sync monitoring so you're warned before reports break

Best fit Teams that already have a BI analyst or data engineer and want to own the build.

Track 02

Done for you

We build the whole thing, end to end.

Everything in Self-serve
Dashboards built to the questions your team actually asks
Automations between your systems
AI workflows scoped to real tasks your team runs
Custom apps where a dashboard does not cut it
Ongoing delivery at a pace that fits your team

Best fit Teams without in-house BI or dev capacity. You tell us what you need and we deliver it.

Before you book

Frequently asked questions.

Who owns the data?

You do. It lands in your warehouse, on your cloud account. We don't resell or aggregate it. If you stop working with us, the warehouse stays yours and keeps running.

How fresh is the data?

Near real time for most operational systems. For heavier sources we schedule hourly or nightly. You pick based on what the reports need.

Do I need a warehouse already?

No. If you don't have one, we help you pick one and set it up as part of the first delivery. Common starting points are Snowflake, Microsoft Fabric, or a small Postgres start.

Does this work on self-hosted Apache Superset or do we need Preset Cloud?

Both paths work. The warehouse is the source of truth, and Superset reads it through a regular SQLAlchemy connection. Self-hosted Apache Superset under Apache 2.0 handles a lot for free, Preset Cloud removes the upgrade, scaling and Celery-worker effort, and Preset's enterprise tiers add SSO, governance and embedded analytics when you need them. The dataset layer stays the same regardless of where Superset itself runs.

We have dozens of virtual datasets and saved SQL Lab queries, half of them stale. How do we clean that up?

Start with Superset's own usage logs to find which datasets, charts and queries still get opened, then map the survivors against the warehouse model. The duplicates collapse into one warehouse view per metric, the stale ones move into an archive folder, and new asks land on the curated layer instead of as fresh virtual datasets on the CRM extract.

Can we use Superset embedded dashboards for customer-facing views on this warehouse?

Yes. Superset supports embedded dashboards through the embedded SDK, and Preset Cloud packages it as a managed feature with row-level security. One curated warehouse can drive both the internal team's Superset and the customer-facing dashboards in your product, with tenant filtering applied in the warehouse view rather than in every chart definition.

GDPR-compliant

Data stays in the EU

You own the warehouse

A first deliverable live in four to six weeks.

We review your Apache Superset setup and the systems around it. Together we pick the first thing worth building.

Book a call See our other connectors