Trino connector

Run distributed SQL across your data lake and your operational systems without copying everything first.

Data Panda lands operational data in Iceberg, Delta or Hive tables on S3 and points Trino at it. From there one SQL surface federates the lake, the warehouse and the source systems behind your business, so reporting, automation, AI and apps read the same numbers without an ETL hop in between.

About Trino

The distributed SQL engine that queries the lake where the data already lives.

Trino is the open-source distributed SQL query engine that grew out of Presto, the project Martin Traverso, David Phillips and Dain Sundstrom started inside Facebook in 2012 to run interactive analytics on a Hadoop warehouse the size of the company itself. The three founders left Facebook in 2018, kept building the engine outside as PrestoSQL, and renamed the project Trino in December 2020 after a trademark dispute with Facebook over the Presto name. The code stays Apache 2.0; the Trino Software Foundation governs it; Starburst, the company co-founded by the same Presto creators alongside Justin Borgman, ships the commercial managed version.

The architecture is coordinator and workers, with one SQL plan that fans out across as many machines as the cluster has. What makes Trino different from a warehouse is the connector layer: Iceberg, Delta Lake, Hudi, Hive, Postgres, MySQL, SQL Server, Snowflake, BigQuery, Cassandra, MongoDB, Kafka and roughly thirty more all sit behind the same SQL dialect, joinable in one query. That is why Netflix, LinkedIn, Goldman Sachs, Salesforce, Stripe, Shopify and Lyft built their interactive analytics on Trino: one engine reads Parquet on S3 next to Postgres rows next to Snowflake tables, without a copy step in between. Trino is not an OLTP database and not a replacement for Snowflake or BigQuery; it is the SQL layer that lets you query the lake and federate across systems without lifting the data out first.

What your Trino data is for

What you get once Trino is connected.

BI on the lake, not on a copy

Power BI, Tableau and Metabase read curated Iceberg or Delta tables through Trino instead of waiting for a warehouse load.

JDBC and ODBC drivers expose Trino to every BI tool that speaks SQL
Iceberg and Delta tables queried in place on S3 with predicate pushdown
Cross-source joins against Postgres or Snowflake without an extra ETL

Federation instead of point-to-point ETL

Trino joins operational databases, the warehouse and the lake in one query, so you stop copying data just to combine it.

One SQL statement across Postgres, Snowflake and S3 Parquet
Reverse ETL queries written once and re-run on the latest tables
Lakehouse pipelines that read raw, write curated, all inside Trino

AI on warehouse-grade lake data

LLM and ML pipelines pull governed Trino result sets instead of stitching CSVs together.

RAG context queries hit the same Iceberg tables BI reads from
Embeddings stored in Postgres or Pinecone joined to lake facts in one query
Notebook and agent code talk to one SQL endpoint instead of five connectors

Internal apps on a federated SQL endpoint

Custom dashboards, customer portals and Streamlit apps query Trino once and reach every backing store behind it.

A single connection string covers lake, warehouse and operational tables
Per-tenant filters pushed down into Iceberg partition pruning
Schemas evolve in the source without breaking the app's SQL

Use cases

Use cases we deliver with Trino data.

A list of concrete reports, automations and AI features we have built on Trino data. Pick the one that matches your situation.

Lakehouse SQLIceberg, Delta and Hive tables on S3 queried in place at warehouse-class speed.

Federated joinsOne query that joins Postgres, Snowflake and Parquet on S3 without exporting first.

Cold-storage analyticsYears of historic data on object storage stays queryable without restoring to a warehouse.

Reverse ETL via SQLWrite SQL against the lake, push the result back into the operational system that needs it.

ETL replacementFederate over sources instead of building yet another extract job for a one-off question.

Ad-hoc data explorationAnalysts write SQL against any source the cluster knows about, no ticket to data engineering.

BI on IcebergPower BI, Tableau and Metabase pointed at a Trino endpoint that fronts the lake.

Schema migrationOld MySQL or SQL Server queries re-run unchanged against the new Iceberg layout.

Data product APIOne SQL endpoint behind internal APIs that have to read from many sources.

Cost arithmeticSelf-managed Trino, Starburst Galaxy or AWS Athena chosen on workload, not on the brochure.

AI retrieval layerAgents and RAG pipelines call one Trino SQL endpoint instead of five connectors.

Real business questions

Answers you will finally get.

Is Trino a replacement for Snowflake or BigQuery?

No. Trino is a query engine, not a managed warehouse. It runs SQL across whatever storage the connectors point at: Iceberg or Delta on S3, Postgres, MySQL, Snowflake itself, BigQuery, Cassandra, Mongo. Snowflake and BigQuery still win when you want a managed warehouse with built-in storage, governance and concurrency for many BI users. Plenty of stacks now run both: the warehouse for the curated reporting layer, Trino for federation across the lake and the operational systems that nobody wants to copy in.

Why would we run Trino instead of Athena?

Athena is managed Trino on AWS, so for S3-only workloads inside one AWS account it is often the simpler choice. Self-managed Trino or Starburst becomes interesting when you need federation across systems Athena does not know (a Snowflake account, a Postgres replica, a MongoDB cluster), when you want to pin compute and avoid per-query billing, or when you run on Azure or GCP. The point is workload, not religion: small AWS-only team starts with Athena, federated cross-cloud setups end up on Trino or Starburst Galaxy.

We have an existing data lake on S3. What does Trino change?

It turns the lake into something analysts can query with SQL instead of Spark or notebooks. Once Iceberg or Delta tables sit on S3, a Trino cluster gives BI tools a JDBC endpoint that reads them in place with partition pruning and predicate pushdown. The same cluster can join those lake tables with Postgres or the warehouse in one query, so the lake stops being a write-only archive and starts being the analytical surface.

Value for everyone in the organisation

Where each function gets value.

For finance leaders

Finance teams keep multi-year close history on cheap object storage and still query it from the same SQL endpoint as the live ledger. A reconciliation that needs three years of journal entries no longer waits on a warehouse restore; Trino reads the Iceberg tables directly.

For sales leaders

Sales operations gets one SQL surface that joins CRM, billing, support and product usage without a nightly ELT into a separate warehouse. Account reviews stop blocking on a data engineering ticket because the data already sits where Trino can read it.

For operations

Data and platform leads cap warehouse cost by routing exploratory and federation workloads through Trino on the lake instead of the production warehouse. Sensitive operational systems stay where they are; Trino reads them in place with role-based credentials per catalog.

Your existing tools

Your data lands in a warehouse. Your BI tools read from it.

You keep the reporting tool you already have. We connect it to the warehouse where your Trino data lives.

Power BI Microsoft

Fabric Microsoft

Snowflake Data warehouse

BigQuery Google

Tableau Visualisation

Excel Sheets & pivots

Three steps

From Trino to answers in three steps.

Connect securely

OAuth authentication. Read-only by default. We sign a DPA and your admin keeps the keys.

Land in your warehouse

Data flows into your warehouse on your schedule. Near real time or nightly, your call. You own the data.

Reporting, automation, AI

We build the first dashboard, workflow or AI feature with you, then hand over the keys. Or we stay on for ongoing delivery.

Two ways to work with us

Pick the track that fits how you work.

Track 01

Self-serve

We set up the foundation. Your team builds on top.

Trino connector configured and running
Warehouse set up in your cloud account
Clean access for your Power BI, Fabric or Tableau team
Documentation on what's in the data model
Sync monitoring so you're warned before reports break

Best fit Teams that already have a BI analyst or data engineer and want to own the build.

Track 02

Done for you

We build the whole thing, end to end.

Everything in Self-serve
Dashboards built to the questions your team actually asks
Automations between your systems
AI workflows scoped to real tasks your team runs
Custom apps where a dashboard does not cut it
Ongoing delivery at a pace that fits your team

Best fit Teams without in-house BI or dev capacity. You tell us what you need and we deliver it.

Before you book

Frequently asked questions.

Who owns the data?

You do. It lands in your warehouse, on your cloud account. We don't resell or aggregate it. If you stop working with us, the warehouse stays yours and keeps running.

How fresh is the data?

Near real time for most operational systems. For heavier sources we schedule hourly or nightly. You pick based on what the reports need.

Do I need a warehouse already?

No. If you don't have one, we help you pick one and set it up as part of the first delivery. Common starting points are Snowflake, Microsoft Fabric, or a small Postgres start.

What is the difference between Trino and Presto?

Trino is the project that the original Presto founders Martin Traverso, David Phillips and Dain Sundstrom kept building after they left Facebook in 2018. They first called it PrestoSQL and renamed it to Trino in December 2020 after a trademark dispute with Facebook over the Presto name. The Apache 2.0 codebase, the contributors and the community moved across; PrestoDB, the Linux Foundation project, is the separate fork that stayed inside Facebook's orbit. When teams say 'Presto' today they usually mean Trino.

Should we run Trino ourselves or pay for Starburst?

Self-managed Trino is fine for a steady, well-understood workload where one team owns the cluster lifecycle. Starburst Galaxy is the managed cloud version from the company that the Trino creators co-founded, and removes the cluster operations in exchange for a per-credit bill, plus it adds connectors, governance and a query catalog the open-source build does not ship. We size the choice on how much patience the team has for cluster ops versus how much budget there is for managed compute.

Does it matter whether we land data in Iceberg or Delta for Trino?

Both work and Trino has a first-class connector for each. Iceberg is the format the Trino project itself leans into hardest, with strong support for partition evolution, hidden partitioning and time travel. Delta Lake works through the Delta connector and is the natural pick for stacks that already use Databricks. The decision is rarely Trino-driven; it follows where the rest of the data platform lives.

GDPR-compliant

Data stays in the EU

You own the warehouse

A first deliverable live in four to six weeks.

We review your Trino setup and the systems around it. Together we pick the first thing worth building.

Book a call See our other connectors