Machine Orchestration, Version Control and Pipeline Management for Deep Learning
Train your models in the cloud or your own server-farm with the click of a button, the call of an API or with a one-liner in the command line. Utilizing the right amount of processing units, get results and save time & money.
Valohai supports massive scale concurrency on top of AWS, Microsoft Azure, Google Cloud Platform and on-premises hardware (e.g. OpenStack). Just click a button and launch your code within a Docker container running on your hardware of choice.
Fulfil any regulatory compliance without any added work. Valohai automatically keeps track of all your experiments so you can always answer the question of how the model was trained, from data to parameters and statistics to algorithm.
Don’t worry about environments, configurations or shutting down servers when your training is done. Concentrate on trials & mastering your models!
We believe that version control is the only way to achieve reproducibility, regulatory compliance, an audit trail and quick results.
Select a deployed model and trace back to the its hyperparameters, training data, script version, associated cost, sibling models and team members involved in training the model. Do it today or 10 years from now.
Valohai integrates with any runtime you have and runs any machine learning code you write. Unlike other deep learning tools, Valohai doesn’t tie you down to one vendor (not even itself, as even the configuration format is open source).
Run your TensorFlow, Keras, CNTK, Caffe, Darknet, DL4J, PyTorch, MXNet or anything from bash scripts to C-code in your Docker wrapper of choice. Store your training data and labels in an Azure Blob, AWS S3 bucket or your own FTP server. Access your code in any public or private Git repository and run it on your cloud or on-premises hardware of choice.
Everything in Valohai is built API first, meaning that you can easily integrate your ML pipeline into your existing Software pipeline, e.g. through Jenkins or any other Continuous Integration platform.
When building deep learning models at scale, you want to use industry best-practices from leaders such as Uber, Netflix, AirBnB and Facebook.
Valohai brings the same tools to your fingertips, that these powerhouses use to manage their internal machine learning pipelines (Uber Michelangelo, AirBnB BigHead, Facebook ML etc).
Valohai’s streamlined machine learning pipeline ensures that steps integrate together, regardless of who has written it or which language or framework was used for it. Generate images with Unity, transform in custom C-code, train with TensorFlow in Python, Deploy to a Kubernetes cluster. Everything works!
Get visual feedback on everything from a single model’s performance to convergence of several parallel hyperparameter sweeps. See how your parameter sweeps are progressing and compare competing models by accuracy, depth or any custom parameter. Instead of manually launching models and keeping track of CSV files, you’ll see everything in real-time as your trainings progress. You can also output custom parameters into stdout and see it graphically in the Valohai web interface.
Valohai lets you scale up vertically and horizontally to do distributed learning and parallel hyperparameter sweeps at the speed of light (in an ethernet cable). Run your model in parallel on a hundred GPUs or tell Valohai to sweep through different hyperparameters to find the best model for your data in parallel on tens of TPUs. Valohai is built for finding and optimizing your model for big data and immense models that scale with you, as you grow from data exploration to production.