Carry out data manipulation

URN: TECDT80842
Business Sectors (Suites): IT(Data Science)
Developed by: e-skills
Approved on: 2023

Overview

This standard is about carrying out data manipulation.

Data engineers prepare data for analytical or operational uses. They are typically responsible for designing and building data pipelines to bring together information from different source systems. They integrate, consolidate, cleanse and structure data to make it easily accessible and in usable format. Processed data can then be used by business executives, data analysts and other end users to inform organisational processes and decision making.

Data Extraction, Transformation and Loading (ETL) involves identifying data sources, importing dating, loading data, converting data, merging and consolidating data and processing data. This also includes storing prepared data ready for analysis or other organisational processes.

This standard is for those who need to carry out data manipulation as part of their duties.


Performance criteria

You must be able to:

  1. Identify the main data types used within an organisation to support data understanding and handling
  2. Review and agree dataset requirements with stakeholders to plan data preparation tasks
  3. Identify target data sources to determine accessibility constraints
  4. Implement security measures for data manipulation to maintain data resilience in line with organisational standards

  5. Apply available data pipelines to assist in providing data flows

  6. Extract, transform and load data for manipulation in line with organisational requirements

  7. Combine and manipulate data from various structured and unstructured sources to produce consolidated datasets in line with requirements
  8. Anonymise data in line with organisational and legal requirements for data access, handling and sharing

  9. Convert data to defined structures and file formats in line with organisational requirements

  10. Export and store datasets into staged data environments to make data available to end users

  11. Develop code to automate data extraction and manipulation
  12. Document source-to-target mappings to show data lineage
  13. Document data manipulation activities and dataset features  in line with organisational procedures

Knowledge and Understanding

You need to know and understand:

  1. How to access and extract data securely from organisational data sources
  2. The need to document data lineage when using and sharing data
  3. The role of data ownership and associated responsibilities in sourcing and accessing data
  4. The main file formats for storing and sharing data
  5. Industry standard tools used for handling, sharing and managing data
  6. Why data manipulation is important
  7. That data manipulation helps to make it easier to understand the dataset and to break it into manageable chunks
  8. Industry standard data processing languages and how to use them
  9. How to access and load the dataset to perform manipulation
  10. The different terms used that refer to data manipulation including preparing, transforming and wrangling data
  11. How to join and merge multiple datasets from various sources using common keys to combine them into a single dataset
  12. The industry standard processes that are used to manipulate data
  13. Organisational policies and national regulations associated with data management and data protection, storing and sharing data
  14. The requirement for effective safe usage and security of data within organisations
  15. The difference between wide and long data formats and how to apply them for structuring datasets
  16. How to format datasets to produce the final structure required
  17. How to provide documentation associated with data manipulation activities
  18. How to design, write and iterate code from prototype to production-ready for data manipulation and staging solutions
  19. How to work with large or complex datasets
  20. The importance of ethics in relation to data engineering, including organisational codes of practice

Scope/range


Scope Performance


Scope Knowledge


Values


Behaviours


Skills


Glossary


Links To Other NOS


External Links


Version Number

1

Indicative Review Date

2026

Validity

Current

Status

Original

Originating Organisation

ODAG Consultants Ltd.

Original URN

TECDT80842

Relevant Occupations

Information and Communication Technology Professionals

SOC Code

2134

Keywords

data engineering, data manipulation, data design, data processing, data cleansing