Tracing the source and flow of data is an arduous task. As your organization accumulates information systems, it also accumulates data entry points and transformation rules for ever-moving data. Additionally, data integration tools, extract-transform-load (ETL) tools, procedural code and even APIs and business intelligence (BI) reports aggregate and transform data constantly. As a result, it’s difficult to manually compile and understand the complicated web of data formed among the systems within your organization and present it in a simple visual flow. Automated data lineage can provide IT, data governance teams and business users with current visibility and context of organizational data to work more efficiently, make sounder decisions, and better leverage and protect the data at their disposal.
Not only can data lineage enable you to understand where data originates, how it is transformed and how it moves through your organization, it can bring together technical and business attributes and governance, spotlight sensitive data and other data classifications, deliver data quality visibility including helping users quickly conduct root cause analysis to data quality issues, and more. In the absence of automated impact analysis capabilities or in conjunction with these capabilities, data lineage can also be extremely useful for scoping and assessing the impact of potential data management, data intelligence and data platform migration efforts.
Data lineage includes both business lineage and technical lineage.
Business lineage focuses on the datasource overview to help data governance teams and business users understand the flow of data between different data sources within an organization’s data landscape.
Technical lineage focuses on the organization’s data flow at the table and column levels, the underlying transformation layers and rules, temporary tables/files, and other objects of interest for technical users to understand an organization’s data journey.
Both business and technical lineage views are critical for organizations to understand the value of their data and easily assess the impact of changes. The ability to drill down from business lineage to technical lineage — or vice versa — provides added flexibility for data analysts to quickly get the right perspective of data flow.Not all automated data lineage is delivered in the same way. Referred lineage may pull lineage together based on element or attribute names and composition. More-detailed data lineage is based on code at the element level and is more trustworthy.
Data lineage is quickly evolving to become more intelligent, using AI to capture patterns in difficult-to-parse code.