MLOps: How to Get Started?
Are you struggling with managing the machine learning model lifecycle? Does every update to a model feel like a new project? MLOps (machine learning operations) is a collaborative approach that helps manage machine learning in production.
MLOps borrows from the DevOps playbook, which has drastically changed how we develop software in the past 15 years. The MLOps practice aims to reduce time spent doing routine operations such as data gathering, model retraining, and model evaluation, to focus expert resources on further development.
Source: State of ML 2020, 330 respondents
So why is MLOps relevant right now? Production models are becoming more and more common, and with it, MLOps is rising in popularity. In May 2020, we surveyed 330 data scientists, machine learning engineers, and managers in a broad range of companies to ask what they are focusing on for the next three months. Although 20% of respondents said they are still focusing more on the experimentation and learning phase, half of the respondents said they are focusing on developing models for production use. Over 40% said they would be deploying models to production.
Step 1: Recognize the stakeholders.
The size and scope of real-world machine learning projects have surely surprised most – if not all of us. What seems like a straightforward task of gathering some data, training a model and then using it for profit ends up becoming a deep rabbit hole that spans from business and operations to IT. A single project covers data storage, data security, access control, resource management, high availability, integrations to existing business applications, testing, retraining, etc. Many machine learning projects end up being some of the biggest multidisciplinary and cross-organizational development efforts that the companies have ever faced.
To properly implement an MLOps process, you’ll have to recognize the key people and roles in your organization. We’ve talked to countless organizations in the past four years, and while each case is unique, there tends to be a combination of these roles that contribute machine learning projects:
- Data scientists (duh!)
- Data engineers
- Machine learning engineers
- DevOps engineers
- Business owners
These roles are not necessarily one per person, but rather a single person can - and in smaller organizations often has to - cover multiple roles. However, it paints a picture of who you’ll need to identify to gather your specific requirements and use your organization’s resources. For example, you must talk to business owners to understand the regulatory requirements and IT to grant access and provision cloud machines.
Step 2: Invest in infrastructure.
MLOps is all about operational infrastructure that makes developing machine learning more efficient. There are plenty of proprietary and open-source products that solve parts of the machine learning lifecycle. When comparing platforms in the MLOps space, you’ll often run into apples to oranges comparisons. For example, comparing KubeFlow and Valohai is tricky because the former is an extendable, open-source solution requiring weeks to adopt, and the latter is a managed, proprietary solution.
To make it more straightforward on how to decide on what infrastructure solutions to adopt, you should consider the following aspects:
- Reproducibility Will the solution make it easier to retain knowledge about data science work?
For example, experiment tracking and data version control are solutions that will significantly increase your teams capability to reproduce previous work.
- Efficiency Will the solution save us time or money?
For example, automated machine orchestration will reduce the risk of paying for costly GPU instances even when they are not in use, and pipeline capabilities will remove manual work from routine operations.
- Integrability Will the solution integrate with our existing systems and processes?
For example, feature store will make it easier for data engineers and data scientists to work together around data and integrate with ML platforms.
There are many approaches to ML infrastructure that can work, whether it’s coupling specialized systems or using a single multipurpose platform. However, infrastructure work should start as early as possible to avoid situations where your team has models in production, but how the models were produced isn’t well documented and each new release is becoming increasingly difficult to do.
Step 3: Automate, automate, automate.
When moving from POC to production, there is a significant change in mindset you’ll have to make. The ML model is no longer the product; the pipeline is.
While machine learning projects often start with one huge notebook and some local data, it’s essential to begin splitting the problem solving into more manageable components; components that can be tied together but tested and developed separately.
This ability to split the problem solving into reproducible, predefined and executable components forces the team to adhere to a joined process. A joined process, in turn, creates a well-defined language between the data scientists and the engineers and also eventually leads to an automated setup that is ML equivalent of continuous integration (CI) – a product capable of auto-updating itself.
An end-to-end, automated machine learning pipeline ensures that every change – in either data or code – is (or can be) deployed to production without it turning into a special project.
The result of MLOps should be a supercharged release cycle of machine learning capabilities. It’s paramount to understand that MLOps is a combination of people and technologies; people have to be willing to automate and document their work and tools have to make it easy.
For more practical MLOps advice, sign up for our upcoming eBook and receive a preview edition today.