Kubeflow as Your Machine Learning Infrastructure

By now youâve surely heard about Kubeflow, the machine learning platform based out of Google. Kubeflow basically connects TensorFlowâs ML model building with Kubernetesâ scalable infrastructure (thus the name Kube and Flow) so that you can concentrate on building your predictive model logic, without having to worry about the underlying infrastructure. At least in theory.
In this blog post, weâll look at what Kubeflow consists of and how you would go about setting up your own Kubeflow pipelines. We assume you have basic knowledge about TensorFlow and Kubernetes, although in an ideal world youâd be up and running without knowing about the latter.
What is Kubeflow?
Kubeflow consists of 4 main components that youâll see when you open the admin console:
JupyterHub: allows spawning Notebook servers for interactive development.
TFJobs: allows monitoring your running Kubernetes training jobs (there are other more or less maintained job types too)
Katib: hyperparameter tuning tools (Study, StudyJob)
Pipelines: acyclic graphs of containerized operations written in Python, passing outputs to inputs (strings)
Getting started with all of this you at least need a basic understanding of how to configure scaling Kubernetes clusters. Youâll also need to install ksonnet to try Kubeflow and then learn how to use it before being able to use Kubeflow, which might set off a few people. [ Update Feb 10th 2019: Ksonnet was discontinued a few days ago ]
Most software engineers know how to build and set up their own Docker images, so this shouldnât be a problem â but for data scientists without a Software Engineering background, itâs one more thing to learn. Setting up everything requires some research, but isnât too hard for an experienced software developer.
As a summary, Kubeflow is a hassle to set up the first time and thus prototyping it takes more than a little effort. But once you have it up and running, experimenting is a whole lot faster. Ideally you'd have somebody else in your infrastructure team set it up so that you can start playing around with it.
How to use Kubeflow as a Data Scientist?
In practice how youâd be using Kubeflow is that youâd write your code (in JupyterHub or elsewhere using TensorFlow â there is also support for other frameworks but TF is the #1 citizen for sure), wrap it all up inside a Docker container with the set dependencies and pass all of that to Kubeflow. To deploy your code to Kubernetes, you'll build your local project into a Docker container and push the image to Container Registry so that itâs available for the cluster.
Kubeflow will then launch your GCP instances (most probably other cloud providers will be coming along shortly, but some Kubeflow components like Pipelines are only available on GCP as of today), fetch your data through TensorFlowâs native APIs and give you your results. Easy peasy, as long as you know your way around Docker and assuming you have your Kubeflow farm setup somewhere for you. Our current customers don't need to worry about Docker images or Kubernetes clusters as all of that is provided as a service. However, Kubeflow might with quite a high probability, in the future, be the backend for what's empowering Valohai.
What about Version Control?
Kubeflow's main focus is on orchestration and with Kubernetes in the background it shines at it. But at least for our current customers machine orchestration isn't everything. While using Kubeflow, the metadata about your run isnât stored anywhere centrally, like it is with Valohai today. You should always store the Docker image you built; so that you can dig into it later to know more about which version of the code was run and in which environment. Job parameters are stored in Ksonnet component parameters, local `params.libsonnet` files, which you need to manually version.
We didnât find best practices for version controlling your input or output data, so youâll have to figure that out on your own â but as Kubeflow gets more traction, best practices are bound to emerge. The good news is that everything is Dockerized so as long as you store the container, youâll have the code and libraries in one place.
What are the main benefits of Kubeflow?
The main benefits of running on Kubeflow are mainly around Kubernetes and its scalability. Once you have everything up, running your training at scale is a breeze. Also the hyperparameter tuning Katib is really cool!!!
Going forward with Kubeflow?
We at Valohai are seriously evaluating Kubeflow as our backend for the future. Once it matures a bit more, it would let us remove one pieces of the orchestration puzzle and concentrate on version control and a nice UI & API for everything.
At the time of writing, our customers would, however, lose several features (automatic input & output management, support for all major cloud providers, zero setup infrastructure to name a few) so today Kubeflow on Valohai is more of a technical PoC. Also, we aim to abstract all of that in the long run. We think that data scientists shouldn't have to worry about what is running their code in the background, much less about setting up an environment. Data Scientists should be able to just write their code, bring in their data and BOOM â get their results. Iteration speed in experimentation is everything in data science.
Going forward we see large potential for Kubeflow and are big fans of the project. If youâd like to try out Kubeflow yourself, head over to https://github.com/kubeflow/kubeflow or sign up for our Kubeflow beta to be among the first ones to run Valohai on top of Kubeflow!

Comparison Whitepaper
Valohai vs. Kubeflow
Managed or self-managed MLOps, which one is the right for you?