Background
In 2014 a group of Google researchers put out a paper titled Machine Learning: The High-Interest Credit Card of Technical Debt. This paper pointed out a growing problem that many companies might have been ignoring.
Using the framework of technical debt, we note that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning. [D. Sculley, Gary Holt, etc]
Another way the researchers put this in a follow-up presentation was that launching a rocket was easy, but ongoing operations afterwards was hard. Back then, the concept of DevOps was still coming into its own, but these engineers and researchers realized that there were many more complications in terms of deploying a machine learning model vs. deploying code.
This is when the popularity of machine learning platforms began to rise. Eventually, many of these platforms adopted the term MLOps to explain the service they were providing.
That begs the question. What is MLOps?
What Is MLOps?
Machine Learning Operations, or MLOps, helps simplify the management, logistics, and deployment of machine learning models between operations teams and machine learning researchers.
From a naive perspective it is just DevOps applied to the field of machine learning.
But, MLOps actually needs to manage a lot more than what DevOps usually manages.
Like DevOps, MLOps manages automated deployment, configuration, monitoring, resource management and testing and debugging.
A possible machine learning pipeline could look like the image below.
Unlike DevOps, MLOps also might need to consider data verification, model analysis and re-verification, metadata management, feature engineering and the ML code itself.
But at the end of the day, the goal of MLOps is very similar. The goal of MLOps is to create a continuous development pipelines for machine learning models.
A pipeline that quickly allows data scientists and machine learning engineers to deploy, test and monitor their models to ensure that their models continue to act in the way they are expected to.
Is Machine Learning and MLOps Worth It?
Before going on, we need to ask a very important question.
Is it worth for your company to invest in machine learning and MLOps?
Because if the answer is no, then you might as well stop reading this article.
So let’s take a look at some key facts and figures to help answer this question.
Right now, 15% of overall organizations are already advanced ML users and about 65% of companies are planning to adopt machine learning as it helps businesses in decision-making.
But here is what we believe is the most convincing figure.
McKinsey, in 2020, found that companies that successfully implemented AI and machine learning could attribute about 20% of their earnings to AI.
The point is, when implemented correctly, machine learning can provide a lot of value to your bottom line. Whether it be through price optimization, resource management, customer service analytics or some other form of improved process.
Our team has been able to help on projects that have driven similar results and this varies from fraud detection to price optimization.
However, in order to get to a place where your machine learning models are increasing your profits and decreasing your costs you need to be able to have “successful deployments and monitoring” of said models.
This isn’t always the case and is summed up nicely in the tweet below.
This is where MLOps can help.
MLOps Market And Funding
It is predicted that the ML Ops market will be over $4 Billion in just a few years.
Tools are still being developed and range in a wide variety of solutions.
Some tools focus purely on deployment of the models and managing versions, other focus on the entire pipeline. The sheer complexity of the machine learning model life cycle provides many opportunities for third party tools to step in and simplify those gaps.
And it is paying off.
Companies are raising millions of dollars in fundings in the machine learning platform space. DataRobot might be the machine learning platform start-up with the largest amount of capital raised at nearly 1 billion dollars of funding with the most recent round of funding led by Altimeter Capital totaling $270 million in a pre-IPO round. This comes with a 2.7 Billion dollar valuation.
Here are a few other companies vying for space in the MLOps space:
- H2O.ai raised $151 million dollars total
- Algorithmia $38 million total
- Pachyderm $28 million total
- Arrikto-Kubeflow $15 million total
- WhyLabs.ai $4 million total
Below is a visual representation.
Due to all this funding flowing in the MLOps space it is making the space very popular.
MLOps Tools
MLOps tools range from open source, to near billion dollar funded start-ups. This shows in the tools below as some projects clearly try to take on nearly the entire enterprise MLOps pipelines like DataRobot while others focus on specific portions of the pipeline like Feast.dev.
So let’s talk about a few of those tools.
MLflow
Photo from MLflow.
With tools such as MLflow, data professionals can now automate sophisticated model tracking with ease. MLflow debuted at the 2018 Spark + AI Summit and is yet another Apache project. MLflow allows data scientists to automate model development. Through MLflow, the optimal model can be selected with greater ease using a tracking server. Parameters, attributes, and performance metrics can all be logged to this server and can then be used to quickly quarry for models that fit particular criteria. Airflow and MLflow are quickly becoming industry staples for automating the implementation, integration, and development of machine learning models.
Although MLflow is a powerful tool for sorting through logged models, it does little to answer the question of what models should be made. This is a bit more of a difficult question because depending on your model, training may take a sizable amount of resources, hyper-parameters could be unintuitive, or both. Even these problems can, in part, be automated away.
Pachyderm
Photo from Pachyderm.
Managing your data pipelines, models, and data sets is a complex process with a lot of moving parts. Pachyderm aims to simplify that process and make it both traceable and reproducible.
Pachyderm is a data science platform that combines end-to-end pipelines with data lineage on Kubernetes. This platform works on enterprise-scale to add the foundation for any project. The process starts with data versioning combined with data pipelining, which results in data lineage and ends with deploying machine learning models.
It not only tracks your data revisions but also the associated transformations. Furthermore, Pachyderm clarifies the transformation dependencies as well as data lineage. It delivers version control for data using data pipelines that keep all your data up to date.
Kubeflow
Photo from Kubeflow.
Kubeflow is a machine learning platform that manages deployments of ML workflows on Kubernetes. The best part of Kubeflow is that it offers a scalable and portable solution.
This platform works best for data scientists who wish to build and experiment with their data pipelines. Kubeflow is also great for deploying machine learning systems to different environments in order to carry out testing, development, and production-level service.
Kubeflow was started by Google as an open source platform for running TensorFlow. So it began as a way to run TensorFlow jobs via Kubernetes but has since expanded to become a multi-cloud, multi-architecture framework that runs entire ML pipelines. With Kubeflow, data scientists don’t need to learn new platforms or concepts to deploy their application or deal with networking certificates, etc. They can deploy their applications simply like on TensorBoard.
DataRobot
Photo from DataRobot.
DataRobot is a very useful AI automation tool that allows data scientists to automate the end-to-end process for deploying, maintaining, or building AI at scale. This framework is powered by open source algorithms that are not only available on the cloud but also on-premise. DataRobot allows users to empower their AI applications easily and quickly in just ten steps. This platform includes enablement models that focus on delivering value.
DataRobot not only works for data scientists but also non-technical people who wish to maintain AI without having to learn the traditional methods of data science. So, instead of having to spend loads of time developing or testing machine learning models, data scientists can now automate the process with DataRobot.
The best part of this platform is its ubiquitous nature. You can access DataRobot anywhere via any device in multiple ways according to your business needs.
Algorithmia
Photo from TechLeer.
Lastly, one of the most popular MLOps tools is definitely Algorithmia. This framework uses artificial intelligence to productionize a different set of IT architectures. This service enables the creation of applications to use of community-contributed machine learning models. Besides that, Algorithmia offers accessibility to the advanced development of algorithmic intelligence.
Currently, this platform has over 60,000 developers with 4,500 algorithms.
Founded in 2014 by two Washington-based developers, Algorithmia currently employs 70 people and is growing rapidly.
This platform not only allows you to deploy models from any framework or language but also connect to most of the data sources. It is available on both cloud and on-premises infrastructures. Algorithmia enables users to continuously manage their machine learning lifecycles with testing, securing, and governing.
The main goal is to achieve a frictionless route to deployment, serving, and management of machine learning models.
WhyLabs.ai
source: WhyLabs.ai
The goal of WhyLabs Platform is to enable every enterprise, no matter how large or small, to run AI with certainty. Another way they put it is that WhyLabs is an AI observability platform built to enable every enterprise to run AI with certainty.
The team is built up of AI practitioners who have helped build ML platforms like SageMaker as well as ML experts who know the problems that face ML deployments.
The WhyLabs platform is specifically built for data science workflows, incorporating methods and features that we pioneered based on analogous best practices in DevOps. Furthermore, it is easy to install, easy to deploy and easy to operate.
They specifically focus in helping in areas such as eliminating manual troubleshooting, logging and profiling data as well as tracking the general model life cycle and connecting your model’s performance to product KPIs so your team can actually tie results to said performance.
Feast.dev
source: feast.dev
Feast was developed to address the challenges faced while productionizing data for machine learning. In particular this is focused on problems like inconsistent feature definitions across teams, getting features into production and dealing with the inconsistencies features have between training and serving.
Feast does this by developing features like a registry that acts as a common catalog with which to explore, develop, collaborate on, and publish new feature definitions within and across teams. It is the central interface for all interactions with the feature store. Feast also offers ingestion for data to manage consistent copies for future training and testing, a feature retrieval interface and monitoring for your model.
You can read more in their follow up blog post, Bridging ML Models and Data, where they discuss the impact Feast has had at companies like Gojek.
Optuna
source: Optuna.org
Optuna, a hyperparameter optimization (HPO) framework designed for machine learning written in Python, is seeing its first major version release. It is a new framework that aims to make HPO more accessible as well as scalable for experienced and new practitioners alike. This GitHub project has grown to accommodate a community of developers by providing state-of-the-art algorithms, and we are proud to see the number of users increasing. We now feel that the API is stable and ready for many real-world applications.
H2O.ai
source: h2o.ai
H2O is an open source, in-memory, distributed, ML and predictive analytics platform allowing you to build and productionize ML models. The goal of H2o.ai is to make ML accessible to more business users. This is done by extracting insights from your company’s data with limited need of machine learning expertise deployment or tuning of machine learning models. They do this by offering a wide variety of products.
They offer products like Deep Water, Sparkling Water and Steam to name a few. The products do everything from integrating with TensorFlow, MXNet and Caffe for GPU based DL workloads to Spark integration for customers to utilize their existing Spark ecosystem with H2o’s ML.
End Of MLOps Part 1.
Due to length issues for emails we will continue the rest of this in a follow up post. In our following post we will discuss Why MLOps is important.