It's time for an update on what's been happening under the hood of the Valohai platform. We'd like to highlight three major features we've added in the past two months: Support for Kubernetes and Spot instances and the Valohai Python utility library. For all additional tweaks and improvements, check out our patch notes.
Valohai has supported a Kubernetes cluster as a deployment target for years. In addition to serving the deployed models, we have now extended our Kubernetes support to cover the entire data science pipeline.
Valohai has had a Kubernetes-like auto-scaling built-in from day one, but if you already have a Kubernetes cluster set-up and see the value in owning the infrastructure configuration yourself, you can now use your own cluster to execute all Valohai workloads.
For the data scientist, choosing the Kubernetes cluster over Valohai native scaling in your daily work is quite trivial. Valohai workloads are all Docker-based to begin with, so all you need to do is choose a Kubernetes cluster as the target environment for a job and everything else works the same. Valohai will version all your inputs, parameters, logs, and outputs just like before. You can even freely mix-and-match Kubernetes and other execution environments in your pipeline if it suits your needs!
Valohai Python Utility Library (a.k.a valohai-utils)
One of the Valohai core design principles is to be an unopinionated agnostic platform. Valohai always communicates at an operating system level, because reading and writing files from disk, parsing command-line, and printing to stdout is what every programming language or framework in the world is able to do.
That said, the same integration code was getting reinvented in slightly different variations for every client we have. Most of our clients use Python, so we decided to make life a bit easier for them and released a utility library. It is now easier to define your parameters, read and write input and output data from the disk and log your metrics. Updating the Valohai configuration YAML automatically based on your source code and defining an entire pipeline in Python is also supported.
To demonstrate the use of the utility library, here is a simple image resizer preprocess step with and without the utility. The utility allows a much shorter, cleaner, and higher-level coding style.
Example without valohai-utils:
Example with valohai-utils:
One of the most requested features from our customers has been the spot instance support.
The cloud providers sell their unused compute resources temporarily with steep discounts. These are called spot instances. They operate on a free market of supply and demand where you set the max price you are willing to pay and get a machine when it is available for that price. Spot instances are perfect if you are flexible with when your workloads will run and the small chance of them getting interrupted.
In addition to the default on-demand instances, which are guaranteed to start right away and not get interrupted, it is now possible to request a spot instance environment for your Valohai workloads.