Develop and implement machine learning algorithms
Overview
Performance criteria
You must be able to:
- prepare datasets from multiple databases and other sources to input into machine learning models
- capture, organise and prioritise requirements to describe organisational needs
- evaluate datasets to identify quality issues to determine and document an approach to addressing them
- translate business and technical requirements into machine learning problems to plan and develop solutions
- conduct data cleaning of noisy, incomplete or data with established data quality issues using approved tools and techniques
- select and develop data sets, algorithms and modelling techniques required to solve organisational data problems
- create analytical models to produce machine learning solutions
- evaluate and validate machine learning models to ensure no bias is introduced
- apply best-practice techniques for output model testing and tuning to assess accuracy, fit, validity and robustness
- design and implement dashboard and automated reporting systems to deliver updates on model performance
- develop strategies for model improvement as well as improvements to data and retraining
- create and disseminate reports, presentations and other documentation that provides storytelling and description of model development to confirm stakeholder approval for handover to implementation
Knowledge and Understanding
You need to know and understand:
- the stages of the machine learning lifecycle and how to apply them
- the characteristics of different machine learning methods and models including; supervised learning; unsupervised learning; text mining, reinforcement learning, ensemble learning; predictive modelling; classification models; regression models and clustering models
- a wide range of statistical methods and best-practice modelling techniques and how to apply them
- the required data cleaning techniques used to improve data quality
the dataset preparation activities that are required in the machine learning process including data collection, formatting, reduction, decomposition and rescaling
how to select and apply machine learning algorithms for classification, regression and clustering using existing libraries
- the required machine learning procedures for text data
the steps involved in machine learning output model validation and how to apply them
the variables and features that impact model performance to test and validate output model performance
- the factors that impact model validation such as the size of the data set and how it is segmentedÂ
the differences between structured and unstructured data
the required training and testing steps for data sets to produce accurate models
how to evaluate machine learning model performance
the tools, systems and procedures for developing machine learning models
- the techniques for identifying and reducing bias in datasets and how to apply them