Cleanse data to resolve quality issues
Overview
This standard is about cleansing data to resolve quality issues.
Data engineers prepare data for analytical or operational uses. They are typically responsible for designing and building data pipelines to bring together information from different source systems. They integrate, consolidate, cleanse and structure data to make it easily accessible and in usable format. Processed data can then be used by business executives, data analysts and other end users to inform organisational processes and decision making.
Cleansing data involves profiling datasets, organising data, tidying data, and cleansing datasets to resolve data quality issues. This also includes documenting quality issues and cleansing activities applied to resolve them.
This standard is for those who need to cleanse data to resolve quality issues as part of their duties.
Performance criteria
You must be able to:
- Review the organisations data architecture and data types to inform data cleansing practice
Organise data to arrange it into accessible structures for cleansing
Profile datasets to identify data quality issues and cleansing needs
Develop a data cleansing strategy to maintain high data integrity
Investigate and resolve data quality issues in line with organisational procedures
Apply data cleansing tools to datasets to filter out unwanted data
Write and execute tests to validate data quality
Automate data cleansing processes to improve accuracy and efficiency
Document data quality metrics, issues and resolutions in line with organisational procedures
Knowledge and Understanding
You need to know and understand:
- How data is structured, where it is located and how it flows in organisational processes
- The organisations data architecture, models and data types used
- Why it is important to cleanse data
- The range of common data quality issues that can arise in data and how to check for them including duplicate data, missing values, null data and outliers
- The steps involved in profiling datasets to identify quality issues
- How to cleanse datasets to resolve data quality issues
- The industry standard tools and techniques used to cleanse data and how to apply them
- How to measure and report data quality metrics
- The supplementary performance issues that can occur in connection with data quality and how to resolve them
- The main steps involved in tidying and cleansing data and how to apply them
- The tools and techniques used to monitor data quality within an organisation
- Industry best practice strategies used to improve data quality
- How to document data quality metrics, issues and resolutions