Data Formats
Labels
Groups data into predefined buckets.
Quantity
Numeric quantities able to be used for calculations.
Unstructured
Data which is sometimes not a quantity and does not have predefined buckets.
Data Types
Features
Data that could be a contributing factor to the Label or Quantity data you’re trying to predict.
Outcome
Real values of data trying to be predicted
Prediction
Estimate of the Outcome data, which allows Outcome data to be utilized before real data is received.
Machine Learning Development Process
Sample Data
Randomly split data into Training data and Test data.
Choose Features
Identify Features looking to be used within the Model.
Train Model
Feed the Model the Training data with the Features, Algorithms and Outcome data.
Test Model
Feed the Test data through the Trained Model to receive Predictions.
Evaluate Result
Compare the Outcome Test data to the Prediction’s from the Trained Model.
Adjust Model as Needed
Evaluate ways to improve Prediction Accuracy by adjusting Features and/or Algorithms used by Model.
Industrialize Model
Industrialize model by automating the feeding of new data for generation an updated Trained Model.
Implement Model
Execute the Trained Model within the business making use of the Prediction data.
Model Types
Classification Model
Classification will try to determine which bucket a record belongs based on the features that are selected. Model will return a confusion matrix, to give insight on its accuracy and precision.
A confusion matrix is a table showing the visualization of the performance of an algorithm.
Regression Model
Regression will try to predict a specific quantity. Model will return a regression line, to give insight on the accuracy and precision of the model.
A Linear Regression Line uses a straight line, while logistic and nonlinear regression models use curved. Regression lets you estimate how a dependent variable changes as the independent variable(s) change.
Challenges when Scaling ML beyond Ad-hoc Experimentations
Scalability
Hard to scale models to other countries, markets, channels, retailers.
Parallelization (process data in parallel) can be a challenge depending on how models are built.
Governance
Alignment from Senior leadership on data science governance.
MLOps & Data Science activities and teams.
Duplication of assets.
Reliability
Trust from end users in model outputs to replace human decision making.
Incident & Issue resolution in timely fashion.
Over time, data scientists are faced with managing increased time towards supporting models once productionized, instead of innovating new models or improving them.
Resiliency
Reproducibility of code, models, & Datasets
Library versioning & security compatibility
Accuracy
Maintaining and monitoring model accuracy in an automated and robust fashion.
Ease of integrating model enhancements to improve accuracy.
Solutions for ML beyond Ad-hoc Experimentations
MLOps
MLOps is a set of principles for operationalizing machine learning helps to address these by building a framework of auditability monitoring and responsiveness it also empowers data scientists through automation scalability and reproducibility of machine learning solutions to allow them to focus on those higher value activities of innovating new Machine Learning (ML) models and new experiments.
Get in touch
If you’d like to learn more about how we can help you leverage the latest technologies to make timely, data-driven business decisions, we’d love to hear from you.