Pros and Cons of Open-Source and Managed MLOps Platforms
If you are reading this article, you are probably trying to find the best MLOps platform for your projects. Hurray, we have a dedicated article comparing the most popular MLOps and machine learning platforms on the market on a variety of scales!
This article focuses specifically on how managed MLOps platforms compare to the open-source ones. So, without further ado, let’s see the benefits and shortcomings of both types.
Considering the Open-Source MLOps Platforms
Open source solutions might seem like an obvious go-to option for decision-makers with a technical background. After all, building and customizing things is what they are best at and what brings their food to the table. There is a slew of alternatives in the open-source MLOps space, but most commonly, the discussion revolves around the following three:
- Kubeflow by Google: The open-source end-to-end MLOps platform built on Kubernetes
- MLflow by Databricks: Open source tool for experiment tracking and model versioning
- Metaflow by Netflix: A newer entrant focusing on scalable machine learning pipelines
However, staying critical is crucial. Oftentimes there’s no need to re-invent the wheel but rather carefully assess the positive and negative sides of a decision to go with the open-source MLOps solution. Here are some of them.
Pros of open-source MLOps platforms
Lots of free extensions
First of all, the strength of many open-source MLOps tools lies in a wide variety of open-source extensions allowing to cover for missing functionality. For example, KFServing can act as an extension to KubeFlow for serverless inference. Likewise, one can use Katib for hyperparameter optimization.
Flexibility in customization
Moreover, when building your own open-source-based MLOps platform, you don’t really have any limits or constrictions to the types and amount of customization you can create. For example, let’s say you want to implement Kubeflow, but you would instead leave some features out. You can simply fork the repository and make modifications that match your needs in the best way.
Last but not least, the source of power for open-source solutions lies in its community. In this regard, MLFlow and KubeFlow are great examples as their communities are keen on delivering case studies, tutorials, how-tos, and other valuable materials helping to deal with a wide variety of issues.
Cons of open-source MLOps platforms
While open-source MLOps solutions provide a lot of flexibility in implementation, this is the very reason why adopting these tools is so painfully slow. KubeFlow users can confirm that setting it up in your environment and adapting it to your projects is a long bumpy road. As one can figure out from the name, Kubeflow works on Kubernetes, and for those having no experience with this tech, implementation can become a hassle.
On the other hand, MLFlow might seem like an easier option in terms of setting it up. However, it’s not an end-to-end MLOps platform but an experiment tracking and model versioning tool. As the CTO and Co-Founder of Levity, Thilo Huellmann, has pinpointed: “Even with all the ready-made pieces we could use to build our solution; it just becomes an unreasonable budget and resourcing request to build and maintain our own custom MLOps solution”.
If it looks like setting up an open-source MLOps solution is problematic and costly, you should also remember that once your system is up and running, you need to take care of the maintenance. It is manageable for bigger companies where machine learning solutions are in the care of several teams supported by the dedicated operations team. However, for smaller companies without these kinds of resources, backend issues are huge roadblocks getting in the way of the whole data science team.
That said, from an ease-of-use perspective, Kubeflow doesn’t feel mature enough, particularly for such a complex system. Moreover, it assumes a lot of competency with Kubernetes and/or containers, which frankly is great if you have that and disappointing if you don’t — not every data science team will. Kubeflow is a tool for a grin-and-bear-it intermediate or truly advanced team of ML engineers.Byron Allen
Considering the Managed MLOps Platforms
As the MLOps market has grown, many established companies and startups started offering their managed MLOps solutions. Today you can find anything from point-solutions focused on a single aspect of MLOps, like Seldon for model deployment, to end-to-end managed MLOps, like Valohai and Sagemaker. Naturally, teams also combine some of the managed and open-source tools to suit their specific needs.
In our view, an end-to-end MLOps platform should cover at least three areas. Tracking and versioning, pipeline automation, and model deployment.
While we are proud to say that our platform offers all of the abovementioned, other great players offer these features too, including ClearML, cnvrg.io, and Iguazio. In addition, cloud vendors also have their own managed MLOps offerings, including AWS SageMaker and Azure ML.
Let’s see in detail the benefits and drawbacks of the managed MLOps platforms.
Pros of managed MLOps platforms
With the managed MLOps solution, you can cross out adoption issues from your list of problems to deal with right away. We understand that depending on your company or organization the decision to purchase a tool might be tricky to make. However, it is a fast and painless solution compared to the time, effort, and funds you will need to invest into building your custom platform.
To solidify this argument, Valohai offers a two-week proof-of-concept without any necessity to commit for our clients. Our experts will set up the platform on your environment and walk you through your first project. At the same time, opting for the open-source MLOps platform means that it will be weeks before you can even get to implementing your ML projects.
Not much we can add here. With the open-source MLOps platforms, it is on you to implement new features or choose and install an addon. While you have the freedom to choose, you also have to deal with dependency management.
One may argue that there’s a lack of control over the managed MLOps roadmap. However, the competitive environment in the MLOps market spurs the improvement speed across the platforms.
When opting in for a managed MLOps platform, you are also opting for a partner in the face of a platform developer. At Valohai, we are not only communicating with our customers on a daily basis to share our insights and best practices but also helping them with issues unrelated to our product and implementing features requested by our customers.
Cons of managed MLOps platforms
Selecting a vendor
The biggest challenge with the managed MLOps platforms is the selection and the purchase stage. Technologists and data scientists are not experts in procurement. The decision-making possesses often gets stalled.
Firstly, the regular practice among the technical decision-makers would be to build things themselves and not ask for approval of a rather expensive purchase. Moreover, the only way to understand which platform is the best fit is through the actual hands-on experience since the MLOPs market is relatively young. To address this issue, Valohai offers a commitment-free POC.
While your managed MLOps platform vendor can be your strongest partner and the best support, if your vision of the features differs, it can become your biggest weakness. For example, your vendor does not provide the features that you need while your competoítors get these features through different platforms.
There is absolutely no guarantee that choosing an open-source MLOps solution will spare you from this sort of problem. However, the risk is higher with the managed platforms, and switching to another vendor becomes more difficult as time passes and you accumulate the cooperation history with your service provider.
To assess the depth of your lock-in with the specific vendor, try answering this question:
- Are all file formats used nonproprietary?
- Does my ML code stay relatively unchanged?
- Does the platform integrate through APIs?
Each side of the coin has its pros and cons, but most often, the discussion between managed and self-managed open-source comes down to what is the most limited resource you have. A managed MLOps platform minimizes the need for engineering resources but requires a certain amount of investment. While open source MLOps platforms are the other way around.
Companies with large engineering teams committed to supporting data science, such as Spotify, can make the most of open source platforms like Kubeflow. If however, that doesn’t seem realistic for you, because you don’t have those engineering resources and you also aren’t looking to hire, then looking into managed MLOps seems like a better option. For example, our client Levity skipped hiring ML engineers altogether because the Valohai platform supports the data science team.
As you can see, there is no shortcut to deciding whether you should go with the managed MLOps platform or a custom open-source solution. We hope that we were able to give you some things to consider when making your decision. If you need more food for thought, be sure to download one of our comparison whitepapers below.