ClickHouse connector

Land business data in ClickHouse and run sub-second analytics on billions of rows.

Data Panda lifts data from your CRM, ERP, ecommerce, product and event systems into ClickHouse on a known schedule. Once it sits in one columnar warehouse, your dashboards, automations, AI workflows and internal apps query the same tables and get answers back in milliseconds instead of minutes.

Data Panda Reporting Automation AI Apps
ClickHouse logo
About ClickHouse

The columnar OLAP database built for real-time analytics on huge tables.

ClickHouse started inside Yandex in 2009 as an internal analytics engine for Yandex Metrica, the company's web-analytics product. It was released as open-source software under the Apache 2.0 license in 2016, and ClickHouse Inc was incorporated in San Francisco in September 2021. The company now ships the open-source database alongside ClickHouse Cloud, a managed offering on AWS, GCP and Azure.

The architectural choice that defines ClickHouse: columnar storage with vectorized execution. Tables are stored column by column on disk, queries read only the columns they touch, and the engine processes data in batches that map cleanly onto modern CPU instructions. The MergeTree family of table engines layers a sparse primary-key index on top, with granules of 8,192 rows by default, which is what lets a single server scan billions of rows in a second or two. The flip side is that the design choices made on day one (the ORDER BY key, the partition expression, the compression codec) decide how the warehouse performs six months later, when the table is at 50 billion rows and the queries that used to return instantly now scan the whole thing. We land the data, model it for the queries you run in production, and pick the engine settings so ClickHouse stays in its sub-second sweet spot instead of degrading into a slow scan engine.

What your ClickHouse data is for

What you get once ClickHouse is connected.

Sub-second dashboards on huge tables

BI tools query ClickHouse fact tables and return aggregations on billions of rows in milliseconds.

  • Metabase, Superset and Grafana read the same MergeTree tables
  • One revenue and customer definition across operational and event data
  • Time-series filters return instantly because the ORDER BY key matches the query

Streaming ingestion on a known cadence

Operational and event data lands in ClickHouse continuously or in scheduled batches, not per dashboard.

  • Kafka, Pulsar or batch loads into the same MergeTree tables
  • Materialized views precompute the heavy aggregations on insert
  • Failed loads surface upstream of the morning report run

Vector and aggregate workloads side by side

ClickHouse stores vectors next to your fact tables, so embedding and aggregate queries hit one engine.

  • Vector search via the built-in distance and ANN functions
  • Aggregations join with embedding lookups in one SQL statement
  • Result sets feed LLM prompts without a separate vector store

Customer-facing analytics in apps

Internal tools and customer portals query ClickHouse directly and get answers in time for the next click.

  • Per-tenant slices served from one warehouse with row-level filters
  • HTTP and native protocols for app-side queries
  • Sub-second latency makes embedded analytics feel like part of the app
Use cases

Use cases we deliver with ClickHouse data.

A list of concrete reports, automations and AI features we have built on ClickHouse data. Pick the one that matches your situation.

Real-time dashboardsSub-second BI on billions of rows where the warehouse used to time out.
Event and clickstreamProduct and web events landed in MergeTree for funnels and retention.
Observability backendLogs, metrics and traces in one columnar store at warehouse compression.
Customer-facing analyticsPer-tenant slices served into apps with row-level filtering.
MergeTree key designORDER BY and partition keys picked for the queries you run.
Materialized viewsHeavy aggregations computed on insert so dashboards read pre-built rollups.
Kafka or Pulsar ingestStreaming pipelines into ClickHouse without a separate ETL tier.
Cloud or self-hostedClickHouse Cloud on AWS, GCP, Azure or self-hosted on your own metal.
S3-backed cold storageTiered storage so older partitions live on S3 instead of fast disk.
Vector searchEmbeddings stored next to fact tables for joint vector and SQL queries.
Cost arithmeticPick Cloud or self-hosted on actual workload, not on the brochure.
Real business questions

Answers you will finally get.

Why does our ClickHouse query that used to take 200 ms now scan every part on disk?

Almost always because the partition key has too many values, so ClickHouse can no longer prune partitions and the merge scheduler is falling behind. A common version is partitioning by day on a table that should be partitioned by month, or by user id when the query never filters on user id. Reworking the partition expression and the ORDER BY on the heaviest tables usually puts the query back where it was.

Should we run ClickHouse Cloud or self-host on our own servers?

ClickHouse Cloud separates storage and compute, scales idle resources to zero and removes the operational work of running the cluster yourself. Self-hosted is meaningfully cheaper at steady high load, especially when you already have hardware or AWS commitments, but you carry the upgrade, replication and backup work. We size both options against the actual query and ingest profile before making the call.

We use Snowflake or BigQuery for the rest of the business. Where does ClickHouse fit?

Snowflake and BigQuery are excellent for daily and weekly reporting on warehouse-grade data. ClickHouse fits when the workload is event-shaped, the table is in the billions of rows, and the dashboard or app needs to return in under a second. Many BE/NL stacks run Snowflake or BigQuery for finance and CRM analytics and put ClickHouse next to it for product events, observability or embedded customer-facing analytics.

Value for everyone in the organisation

Where each function gets value.

For finance leaders

The CFO gets a real-time view on cost-per-event, infrastructure spend per product line and unit economics on usage data that used to be stuck in logs. ClickHouse stores the event history at warehouse compression, and reports that took an overnight job now refresh in seconds.

For sales leaders

Sales sees product engagement, account usage and feature adoption on the same fact tables the CRM reads from. Account reviews stop relying on a sample export from last week because the warehouse can answer per-account questions in real time.

For operations

Ops and platform teams get logs, metrics and traces in one columnar store with warehouse-grade compression. Incident review reads the same data the dashboards do, and the observability bill stops being the loudest line on the cloud invoice.

Your existing tools

Your data lands in a warehouse. Your BI tools read from it.

You keep the reporting tool you already have. We connect it to the warehouse where your ClickHouse data lives.

Power BI logo
Power BI Microsoft
Microsoft Fabric logo
Fabric Microsoft
Snowflake logo
Snowflake Data warehouse
Google BigQuery logo
BigQuery Google
Tableau logo
Tableau Visualisation
Microsoft Excel logo
Excel Sheets & pivots
Three steps

From ClickHouse to answers in three steps.

01

Connect securely

OAuth authentication. Read-only by default. We sign a DPA and your admin keeps the keys.

02

Land in your warehouse

Data flows into your warehouse on your schedule. Near real time or nightly, your call. You own the data.

03

Reporting, automation, AI

We build the first dashboard, workflow or AI feature with you, then hand over the keys. Or we stay on for ongoing delivery.

Two ways to work with us

Pick the track that fits how you work.

Track 01

Self-serve

We set up the foundation. Your team builds on top.

  • ClickHouse connector configured and running
  • Warehouse set up in your cloud account
  • Clean access for your Power BI, Fabric or Tableau team
  • Documentation on what's in the data model
  • Sync monitoring so you're warned before reports break

Best fit Teams that already have a BI analyst or data engineer and want to own the build.

Track 02

Done for you

We build the whole thing, end to end.

  • Everything in Self-serve
  • Dashboards built to the questions your team actually asks
  • Automations between your systems
  • AI workflows scoped to real tasks your team runs
  • Custom apps where a dashboard does not cut it
  • Ongoing delivery at a pace that fits your team

Best fit Teams without in-house BI or dev capacity. You tell us what you need and we deliver it.

Before you book

Frequently asked questions.

Who owns the data?

You do. It lands in your warehouse, on your cloud account. We don't resell or aggregate it. If you stop working with us, the warehouse stays yours and keeps running.

How fresh is the data?

Near real time for most operational systems. For heavier sources we schedule hourly or nightly. You pick based on what the reports need.

Do I need a warehouse already?

No. If you don't have one, we help you pick one and set it up as part of the first delivery. Common starting points are Snowflake, Microsoft Fabric, or a small Postgres start.

ClickHouse Cloud or self-hosted: how do we decide?

ClickHouse Cloud (managed on AWS, GCP or Azure) separates storage and compute and scales idle resources to zero, which fits spiky or growing workloads where you do not want to run the cluster yourself. Self-hosted is meaningfully cheaper at sustained high load, but you carry replication, backup and version upgrades. We size both against the actual ingest rate, query mix and team capacity before making the call.

Why are MergeTree primary key and partition choices so important?

ClickHouse uses a sparse primary-key index with granules of 8,192 rows by default, so the ORDER BY key decides which queries get sub-second latency and which scan the table. The partition expression decides whether ClickHouse can prune entire parts at query time and whether the merge scheduler keeps up. The official docs explicitly warn against partitioning by high-cardinality fields like client identifiers; those belong in the ORDER BY, not the PARTITION BY.

When should we pick ClickHouse over Snowflake or BigQuery?

ClickHouse wins on workloads where the table is in the billions of rows and the dashboard or app has to return under a second, typically product events, clickstream, observability, ad-tech and embedded customer-facing analytics. Snowflake and BigQuery remain stronger for governed enterprise reporting on smaller, slower-changing data. Plenty of stacks run both: Snowflake or BigQuery for finance and CRM, ClickHouse next to it for the high-cardinality, low-latency workloads.

GDPR-compliant
Data stays in the EU
You own the warehouse

A first deliverable live in four to six weeks.

We review your ClickHouse setup and the systems around it. Together we pick the first thing worth building.