Bias
Bias in AI is a skew that creeps into models through data, algorithms, or human choices. It is not always harmful, but it has to be managed ...
Read definitionData lineage shows the full journey data takes inside an organisation. From the original source to the final report, with meaning and context attached. It is what makes people trust the numbers they see on a dashboard.
Data lineage is the map of the journey data takes inside an organisation. You can see which source it came from, the steps it went through along the way, and where it finally landed. That journey can run from an operational system all the way to a dashboard or a report.
Think of it as a route description. Departure and arrival both matter, but so do all the stops in between. Without that overview it gets very hard to understand why the numbers say what they say.
In most organisations, data grows organically. First there is an accounting package, then a CRM, later a pile of Excel files, scripts and dashboards on top. Every new piece adds complexity, and at some point nobody is sure which source is the right one or why a number suddenly shifted last week. That is where data lineage comes in.
Data lineage builds trust in data. Users understand where the numbers come from. Discussions get sharper, and decisions land on firmer ground.
On top of that, data lineage matters for:
Troubleshooting and root-cause analysis
Impact analysis when something is about to change
Audits and regulatory reporting
Knowledge sharing inside teams
Without lineage, your data depends entirely on the people who built it. With lineage, that knowledge belongs to the team.
Data lineage follows your data through the layers it passes:
Source systems like ERP, CRM or external files
Processing through ETL or ELT pipelines
Storage in a data warehouse or database
Consumption in reports, dashboards and analyses
That flow can be captured at different levels of detail, from a high-level diagram down to individual columns. The right level of detail depends on the goal and the audience you are documenting it for.
Technical data lineage describes how data physically flows through systems. It focuses on tables, columns, views and code. The question it answers is: how is this data moved and transformed?
Source tables and fields
Transformations in SQL or ETL tools
Relationships between layers in the data warehouse
Dependencies between datasets and reports
Technical lineage is mainly useful for data engineers and BI developers. They lean on it during troubleshooting, impact analysis and day-to-day maintenance.
Technical lineage is often built up automatically by tooling. That works well, but still needs human review. Complex transformations cannot always be parsed correctly by a scanner.
Functional data lineage describes data from a business angle. It focuses on meaning, definitions and use. The question it answers is: what does this number actually represent?
Definitions of KPIs and metrics
Business rules and filters
Exceptions and agreed conventions
How the numbers feed into decisions
Functional lineage is aimed at business users, management and data stewards. It raises understanding and pushes everyone to use the same numbers in the same way.
Functional lineage is usually captured by hand, through documentation, data catalogs and conversations with the people who own the metric. Automation only goes so far. Alignment between teams is what really keeps it accurate.
Technical and functional lineage complement each other. One shows how data flows, the other shows what data means. Without the technical layer you lack control. Without the functional layer you lack context.
A good approach connects both. A business definition links straight back to a technical source, and one coherent story emerges from the two.
Several types of tools live in this space:
These focus on automatic detection of data flows by scanning code and metadata. They are strong on detail, weaker on meaning.
These combine metadata, business definitions and lineage in one place. They are friendlier for business users and they support governance work directly.
Flexible and budget-friendly, but they ask more from your team in technical knowledge and ongoing maintenance.
Diagrams, wikis and shared documents still pull their weight. Especially for smaller setups, or as a starting point before you invest in tooling.
In practice, a mix of the above is usually the most realistic path.
Start from a clear purpose
Begin small and focus on the data that really matters
Combine technical and functional lineage
Document at the right level of detail
Use consistent terminology across the board
Automate where it actually helps
Keep everything current
Simplicity and discipline beat completeness every time.
Wire data lineage into existing processes. Update it whenever a new report goes live or when something changes. That way it becomes routine rather than a one-off project.
Every dataset and every KPI needs a clear owner. Without ownership, lineage goes stale fast.
You do not need to document everything. Focus on the data that drives decisions or that gets shared with people outside the team.
Tools show you the structure. Conversations create understanding. You need both.
Use lineage actively when questions or change requests come in. The parts that get used are the parts that survive.
A Belgian SME runs a revenue dashboard. The definition has shifted over time. Without lineage, the discussions go in circles every quarter.
With data lineage in place it is clear:
Which source the number is built on
Which transformations it has been through
What revenue actually means in this context
Changes happen under control, and trust in the dashboard grows.
Data lineage does not have to be perfect. It has to work. Better simple and supported by the team than complex and forgotten in a folder. Data lineage is not a document, it is a habit.
Bias in AI is a skew that creeps into models through data, algorithms, or human choices. It is not always harmful, but it has to be managed ...
Read definitionChange Data Capture (CDC) is the practice of detecting every change in a source system and forwarding it to downstream systems. It keeps you...
Read definitionA data contract is an explicit agreement between the producer and the consumers of a dataset: which schema, which quality, which frequency, ...
Read definitionA data warehouse is a central database that collects data from many source systems and structures it for reporting and analysis. It's optimi...
Read definitionDelta Lake is an open storage format that extends plain Parquet files with transactions, schema enforcement, and time travel. It forms the f...
Read definition
Collect&Go and Telenet Business are testing an autonomous electric delivery cart in Leuven, steered over 5G. What it means for logistics and...
Ten practical steps to automate your business processes without AI hype. Start small, fix the process first, use the tools you already own, ...