Intro to Machine Learning

Data Formats

Labels

Groups data into predefined buckets.

Quantity

Numeric quantities able to be used for calculations.

Unstructured

Data which is sometimes not a quantity and does not have predefined buckets.

Data Types

Features

Data that could be a contributing factor to the Label or Quantity data you’re trying to predict.

Outcome

Real values of data trying to be predicted

Prediction

Estimate of the Outcome data, which allows Outcome data to be utilized before real data is received.

Machine Learning Development Process

Sample Data

Randomly split data into Training data and Test data.

Choose Features

Identify Features looking to be used within the Model.

Train Model

Feed the Model the Training data with the Features, Algorithms and Outcome data.

Test Model

Feed the Test data through the Trained Model to receive Predictions.

Evaluate Result

Compare the Outcome Test data to the Prediction’s from the Trained Model.

Adjust Model as Needed

Evaluate ways to improve Prediction Accuracy by adjusting Features and/or Algorithms used by Model.

Industrialize Model

Industrialize model by automating the feeding of new data for generation an updated Trained Model.

Implement Model

Execute the Trained Model within the business making use of the Prediction data.

Model Types

Classification Model

Classification will try to determine which bucket a record belongs based on the features that are selected. Model will return a confusion matrix, to give insight on its accuracy and precision.

A confusion matrix is a table showing the visualization of the performance of an algorithm.

Regression Model

Regression will try to predict a specific quantity. Model will return a regression line, to give insight on the accuracy and precision of the model.

A Linear Regression Line uses a straight line, while logistic and nonlinear regression models use curved. Regression lets you estimate how a dependent variable changes as the independent variable(s) change.

Challenges when Scaling ML beyond Ad-hoc Experimentations

Scalability

Hard to scale models to other countries, markets, channels, retailers.

Parallelization (process data in parallel) can be a challenge depending on how models are built.

Governance

Alignment from Senior leadership on data science governance.

MLOps & Data Science activities and teams.

Duplication of assets.

Reliability

Trust from end users in model outputs to replace human decision making.

Incident & Issue resolution in timely fashion.

Over time, data scientists are faced with managing increased time towards supporting models once productionized, instead of innovating new models or improving them.

Resiliency

Reproducibility of code, models, & Datasets

Library versioning & security compatibility

Accuracy

Maintaining and monitoring model accuracy in an automated and robust fashion.

Ease of integrating model enhancements to improve accuracy.

Solutions for ML beyond Ad-hoc Experimentations

MLOps

MLOps is a set of principles for operationalizing machine learning helps to address these by building a framework of auditability monitoring and responsiveness it also empowers data scientists through automation scalability and reproducibility of machine learning solutions to allow them to focus on those higher value activities of innovating new Machine Learning (ML) models and new experiments.

Get in touch

If you’d like to learn more about how we can help you leverage the latest technologies to make timely, data-driven business decisions, we’d love to hear from you.