AWS S3 connector

Land your business data in Amazon S3, then build the lake, the warehouse and the AI workloads on top.

Data Panda lifts data from your CRM, ERP, ecommerce, finance and product systems into S3 on a known schedule. Once it sits in one bucket structure, Athena, Redshift, EMR, Snowflake and your AI tooling all read the same files instead of each one keeping its own copy.

About AWS S3

Object storage at exabyte scale, built and run by AWS.

Amazon S3 is the object storage service that AWS launched in 2006 and has run continuously since. It holds objects inside buckets, addressed by a key, and the design target is simple: store any amount of data, retrieve it from anywhere on the internet, pay for what you use. AWS publishes a durability target of eleven nines (99.999999999%) and a default availability target of 99.99% on S3 Standard, with data replicated across multiple devices in multiple availability zones inside a region.

The service today carries hundreds of exabytes of customer data and handles more than 200 million requests per second on average, according to the AWS S3 product page. Around the core PUT and GET surface sit a stack of features that matter for analytics: storage classes that range from S3 Standard for hot data through Intelligent-Tiering, Standard-IA, Glacier Instant Retrieval, Glacier Flexible Retrieval and Glacier Deep Archive for colder tiers; Express One Zone for single-digit-millisecond latency; lifecycle rules to move files between classes automatically; versioning and Object Lock for recovery and WORM compliance; replication across regions and accounts; and IAM, bucket policies, Block Public Access and SSE encryption for governance. S3 Tables, the managed Apache Iceberg surface AWS added more recently, lets Athena, Redshift, EMR, Snowflake, Spark, Trino and DuckDB read the same lakehouse tables through the Iceberg REST Catalog without each engine writing its own copy.

What your AWS S3 data is for

What you get once AWS S3 is connected.

One lake, every report

BI tools and SQL engines read curated S3 prefixes instead of stitching across operational systems.

Athena, Redshift Spectrum and external warehouses all read the same Parquet or Iceberg tables
Revenue, margin and customer master defined once in the curated zone
Finance pack and sales board agree before the meeting starts

ELT on a known cadence

Data lands in S3 on a schedule that matches the business, not the loudest dashboard.

Operational systems unloaded once per cycle, not per dashboard
Lifecycle rules move cold partitions to Glacier classes to keep storage cost flat
Failed loads surface upstream of the morning report run

AI workloads on lake-grade data

Bedrock, SageMaker and your own model code train and infer on the same files BI reads.

Training sets pulled from curated S3 prefixes, not ad-hoc CSV exports
Bedrock Knowledge Bases index documents straight from a bucket
Vector and embedding stores stay close to the source files in S3

Apps and downstream systems on top

Internal apps, customer portals and partner exchanges read the same S3 lake.

Snowflake, Databricks and Redshift external tables query S3 directly
S3 Tables expose Iceberg datasets to any compatible engine via the REST catalog
Cross-account replication shares prefixes with subsidiaries without copy jobs

Use cases

Use cases we deliver with AWS S3 data.

A list of concrete reports, automations and AI features we have built on AWS S3 data. Pick the one that matches your situation.

Curated S3 data lakeRaw, staged and curated zones with one definition of revenue, customer and product.

Off the OLTPMove analyst queries off the live ERP onto Parquet snapshots in S3.

Athena on warehouse dataServerless SQL across the lake without standing up a warehouse cluster.

S3 Tables with IcebergManaged Iceberg tables shared across Athena, Redshift, EMR and Snowflake.

Bedrock Knowledge BasesRAG over PDFs and contracts indexed straight from a curated bucket.

SageMaker training setsModel training pulls from versioned S3 prefixes instead of CSV exports.

Lifecycle and Glacier tieringCold partitions slide to Glacier classes so storage cost stays flat.

Cross-account data sharingReplicate prefixes to partner or subsidiary accounts without ETL exports.

Compliance archiveObject Lock plus Glacier Deep Archive for WORM and long-term retention.

Backup landing zoneDatabase snapshots and application backups in one durable bucket layout.

EU-region residencyBuckets in eu-west or eu-central for BE/NL data-residency requirements.

Real business questions

Answers you will finally get.

We already use S3 for backups. Can the same account become our analytics lake?

Yes, and it is the path most BE/NL teams already on AWS take. The pattern is to set up dedicated buckets or prefixes for the lake (raw, staged, curated), keep them separate from the backup buckets via IAM and lifecycle rules, and load operational data into the raw zone on a schedule. Backups stay where they are; analytics gets its own zoned layout that BI and AI tools can rely on.

Should we land data as Parquet files or use S3 Tables with Iceberg?

Parquet in a partitioned layout still works for most reporting needs, especially when only Athena and one or two engines read the lake. S3 Tables make sense once multiple engines (Athena, Redshift, Snowflake, Spark) need to write to the same tables, when you want managed compaction and snapshot retention, or when you want the Iceberg REST Catalog as the shared interface. We pick per workload, not per fashion.

How do we keep S3 storage cost from growing forever as we add raw data?

Lifecycle rules and the right storage classes do most of the work. Hot partitions stay on S3 Standard, warm history moves to Standard-IA or Intelligent-Tiering, cold archive lands in Glacier Flexible Retrieval or Glacier Deep Archive depending on how often you need it back. Combined with versioning expiry on the raw zone, the bill follows business value rather than calendar time.

Value for everyone in the organisation

Where each function gets value.

For finance leaders

The CFO gets reporting that ties to the boekhouding because the underlying numbers come from one curated S3 zone. Revenue, margin and AR carry one definition, sourced from the same lake the sales board reads, so the close stops being three people reconciling exports.

For sales leaders

Sales leaders see pipeline, forecast and quota next to invoiced revenue and product usage on lake-grade data. The same numbers travel to the QBR pack, the standup and the steering committee without copy-paste from a spreadsheet.

For operations

Operations and data leads track S3 storage growth, request cost and lifecycle transitions in one view. The bill becomes predictable, and the lake stops growing sideways with team-specific copies of the same source files.

Your existing tools

Your data lands in a warehouse. Your BI tools read from it.

You keep the reporting tool you already have. We connect it to the warehouse where your AWS S3 data lives.

Power BI Microsoft

Fabric Microsoft

Snowflake Data warehouse

BigQuery Google

Tableau Visualisation

Excel Sheets & pivots

Three steps

From AWS S3 to answers in three steps.

Connect securely

OAuth authentication. Read-only by default. We sign a DPA and your admin keeps the keys.

Land in your warehouse

Data flows into your warehouse on your schedule. Near real time or nightly, your call. You own the data.

Reporting, automation, AI

We build the first dashboard, workflow or AI feature with you, then hand over the keys. Or we stay on for ongoing delivery.

Two ways to work with us

Pick the track that fits how you work.

Track 01

Self-serve

We set up the foundation. Your team builds on top.

AWS S3 connector configured and running
Warehouse set up in your cloud account
Clean access for your Power BI, Fabric or Tableau team
Documentation on what's in the data model
Sync monitoring so you're warned before reports break

Best fit Teams that already have a BI analyst or data engineer and want to own the build.

Track 02

Done for you

We build the whole thing, end to end.

Everything in Self-serve
Dashboards built to the questions your team actually asks
Automations between your systems
AI workflows scoped to real tasks your team runs
Custom apps where a dashboard does not cut it
Ongoing delivery at a pace that fits your team

Best fit Teams without in-house BI or dev capacity. You tell us what you need and we deliver it.

Before you book

Frequently asked questions.

Who owns the data?

You do. It lands in your warehouse, on your cloud account. We don't resell or aggregate it. If you stop working with us, the warehouse stays yours and keeps running.

How fresh is the data?

Near real time for most operational systems. For heavier sources we schedule hourly or nightly. You pick based on what the reports need.

Do I need a warehouse already?

No. If you don't have one, we help you pick one and set it up as part of the first delivery. Common starting points are Snowflake, Microsoft Fabric, or a small Postgres start.

Can we keep our S3 lake fully inside the EU?

Yes. AWS lets you pin buckets to a specific region, and objects in a region do not leave it unless you explicitly replicate them out. For BE/NL teams that means eu-west-1 (Ireland), eu-west-3 (Paris) or eu-central-1 (Frankfurt) for the lake, with Block Public Access on by default and replication scoped to other EU regions if you need geographic redundancy. Data-residency clauses in procurement contracts read cleanly against this setup.

Do we need S3 Tables, or is plain S3 with Parquet enough?

Plain S3 with partitioned Parquet is enough for most reporting and for lakes read mostly by Athena. S3 Tables earn their place when several engines (Athena, Redshift, Snowflake, Spark, Trino) need to read and write the same tables with ACID guarantees, or when you want AWS to handle Iceberg compaction and snapshot retention instead of running it yourself. We pick per workload after we see the read pattern.

How do you keep S3 cost under control as we keep adding raw data?

Lifecycle rules per prefix, the right storage class per access pattern, and versioning expiry on the raw zone. Hot partitions stay on Standard, warm history goes to Intelligent-Tiering or Standard-IA, cold archive lands in Glacier Flexible Retrieval or Glacier Deep Archive. We also watch request cost on Athena and EMR, because scanning whole prefixes instead of partitions is what drives most surprise bills, not storage itself.

GDPR-compliant

Data stays in the EU

You own the warehouse

A first deliverable live in four to six weeks.

We review your AWS S3 setup and the systems around it. Together we pick the first thing worth building.

Book a call See our other connectors