Dictionary

ETL / ELT

ETL and ELT stand for Extract, Transform, Load and Extract, Load, Transform. They are two ways of moving data from source systems into a central analytics environment. The order of transformation and loading decides which pattern you use and which tools fit.

What are ETL and ELT?

ETL and ELT are two patterns for moving data from source systems to a central analytics environment. Both consist of the same three steps, but in a different order.

ETL stands for Extract, Transform, Load. You extract data from the sources, transform it on a staging server, and load the end result into the data warehouse. Transformation happens before the data lands in the warehouse.

ELT stands for Extract, Load, Transform. You extract data from the sources, load it into the warehouse or lakehouse as-is, and transform it inside the target environment. Transformation happens after the data has landed.

It looks like a detail, but the difference shapes the tools you choose, performance, governance, and how quickly a team can ship new reports.

How is ELT different from ETL?

ETL is the older of the two. When warehouses ran on expensive on-premises servers, loading all raw data into them was unaffordable. A separate ETL server handled transformations so the warehouse only ever saw clean, aggregated data.

ELT became popular when cloud warehouses like Snowflake, BigQuery, and Microsoft Fabric made storage and compute cheap enough. Suddenly you could land terabytes of raw data without worrying and decide what to do with it later. That brings a few big advantages:

Reprocessability
When logic changes, you rewrite the transformation. You don't have to re-extract from the source.
Single source of truth
Raw and transformed data live in the same place, under the same security, queryable with the same tools.
Flexibility for data science
ML teams often need access to raw data. With ETL it's gone. With ELT it's ready to use.
SQL as the transformation language
In ELT you transform inside the database or lakehouse. Tools like dbt make that highly productive, and SQL skills are much more widely available than ETL-specific languages.

ELT also has downsides. You sometimes load data you never use, which costs money. And if governance is weak, analysts each build their own transformations, each with a slightly different customer view.

When do you use ETL?

Sensitive data that must not persist in raw form
Medical, legal, or personal data that needs masking or pseudonymisation before landing in an analytics environment.
Limited target environment
When the warehouse isn't powerful enough to handle heavy transformations (typical for older on-premises setups).
Data minimisation compliance
GDPR or sector rules may require that you only store strictly necessary fields in your analytics environment.
Legacy systems and mainframes
When the source is a mainframe with slow exports, you sometimes want to transform during extraction to cut total turnaround time.

When do you use ELT?

Modern cloud platforms
Anyone building a new data platform on Microsoft Fabric, Snowflake, BigQuery, or Databricks today starts from ELT almost by default.
Data science and AI
Models need raw historical data to learn patterns. ELT keeps it available without a second pipeline.
Fast iteration
Where business definitions change frequently, editing a SQL transformation is faster than rewriting an ETL job.
Self-service analytics
Analysts want to build their own gold tables on top of a silver layer. With ELT and a tool like dbt that pattern is built in.

Tools for ETL and ELT

The line between ETL and ELT tools is blurring. Many modern tools support both patterns.

Microsoft Fabric Data Factory
Visual pipelines and dataflows for both ingestion and transformations. Hundreds of connectors to cloud and on-premises sources. A strong option in a Microsoft ecosystem.

Azure Data Factory
The standalone version outside Fabric, still common in existing Azure projects. Integrates with SSIS for migration scenarios from on-premises SQL Server.

dbt (data build tool)
The most popular ELT tool for transformations inside the warehouse or lakehouse. You write models in SQL with version control, tests, and documentation. Runs on Snowflake, BigQuery, Databricks, Fabric, and more.

Power Query
Inside Power BI and Excel, and also available in Fabric Dataflows. Ideal for business users who want to clean and transform their own data without code.

Fivetran, Airbyte, Matillion
Specialists in extraction and loading. Often paired with dbt for the transformation layer.

Apache Airflow, Dagster, Prefect
Orchestration tools that schedule, monitor, and retry ETL or ELT pipelines. Code-first, ideal for complex dependencies.

SSIS (SQL Server Integration Services)
Classic ETL tool in Microsoft environments. Still common in existing landscapes, rarely the first choice for new projects.

Common pitfalls

Schema drift
Source systems quietly change their schemas (new fields, renames, type changes). Without detection, pipelines only break on the next run. Automatic schema detection and alerting belong in every pipeline.

Idempotency
A pipeline must be safe to rerun without producing duplicates or half results. That takes discipline around staging, upserts, and logging.

Backfill
When logic changes, you often need to reprocess history. Build pipelines with backfill in mind from day one or you will be surprised later.

No data lineage
Without data lineage, nobody knows which source feeds which column in which report. Every change becomes a spelunking expedition. Modern tools like dbt and Fabric generate lineage automatically.

Manual exports that linger
CSV dumps by email, Excel files on a shared drive. They outlive their intended lifespan by years and become the silent centre of fragile reporting. Remove them as soon as the official pipeline is in place.

Last Updated: April 18, 2026 Back to Dictionary

Keywords

etl elt data integration data factory dbt power query microsoft fabric data warehouse lakehouse data pipeline

/ Related

Related Terms

Term

Calculation group

A calculation group applies one DAX pattern to every measure in your model. You write YTD, MTD and YoY% once instead of repeating them for e...

Read definition

Term

Change Data Capture (CDC)

Change Data Capture (CDC) is the practice of detecting every change in a source system and forwarding it to downstream systems. It keeps you...

Read definition

Term

Data contract

A data contract is an explicit agreement between the producer and the consumers of a dataset: which schema, which quality, which frequency, ...

Read definition

Term

Data lineage

Data lineage shows the full journey data takes inside an organisation. From the original source to the final report, with meaning and contex...

Read definition

Term

Data mesh

Data mesh is an organisational model for data in which each business domain owns its datasets and offers them as products. It breaks with th...

Read definition

/ Further reading

From the blog.

Sketched illustration of repetitive tasks flowing through a gear into a dashboard showing eight hours saved per week.

Article · Jan 28, 2026

10 Practical Steps to Automate Your Business Processes

Ten practical steps to automate your business processes without AI hype. Start small, fix the process first, use the tools you already own, ...

Hand-drawn illustration of a magnifying glass hovering over a grid of small task cards. One highlighted plum card inside the lens flows into a prioritized automation pipeline on the right.

Article · Jan 28, 2026

How to Identify Automation Opportunities in Your Business Processes

Find the automation opportunities in your business that are actually worth building. A five-question test, the hotspots we keep seeing, and ...