We explore topics of data science, machine learning and MLOps.
Managing AI Products: Feasibility, Desirability and ViabilityHenrik Skogström / January 17, 2022
Product management is as massive a topic as machine learning so let's start with a fundamental question. When is it worthwhile to develop an AI product? A helpful tool most PMs have seen for this is the Sweet Spot for Innovation that IDEO popularized.
Running Weights & Biases Experiments on Valohai PipelinesEikku Koponen / January 14, 2022
Sometimes it is hard to combine the world of experimenting and the more dev-oriented world of data science with robust pipelines and modular work. This example combines Weights and Biases experiments with Valohai's production pipelines.
Git for Data Science: What every data scientist should know about GitJuha Kiili / January 04, 2022
Git is a tool most software developers have used daily for a decade, and with data scientists becoming an integral part of R&D teams, Git is every day for them as well. We've listed a few helpful tips on using Git for your ML work and avoiding the common pitfalls.
Data-Centric AI and How to Adopt This ApproachEikku Koponen and Jean-Emmanuel Wattier / December 20, 2021
The data you have, is, if not the most, at least close to the most valuable asset you’ve got when creating AI systems. So in practice, what can you do to embrace more data-centric AI then? We have prepared some simple steps for you to keep in mind and implement.
Observability in Production: Monitoring Data Drift with WhyLabs and ValohaiEikku Koponen / December 02, 2021
Observability is the collection of statistics, performance data, and metrics from every part of your ML system. Metadata, if you will. We will dig into how we can easily get started with observability and detect data drift using whylogs while executing your pipeline on Valohai.
A Comprehensive Comparison Between Kubeflow and ArgoHenrik Skogström / December 01, 2021
Finding the most suitable platform to build ML workflows may be a challenge. Some are looking toward specific tools built for ML/MLOps, such as Kubeflow, while others are looking at more general-purpose orchestrators such as Argo.
A Comprehensive Comparison Between Kubeflow and MetaflowHenrik Skogström / November 25, 2021
Creating a pipeline to automate ML workflows is necessary to save time and improve efficiency. There are two popular open-source tools for ML pipeline orchestration: Kubeflow and Metaflow. In this article, we will compare the differences and similarities between these two platforms.
Product Update: Human Validation and Confusion MatricesJuha Kiili / November 24, 2021
We’ve recently introduced two features that make building trusted and validated models easier: human validation steps and confusion matrices.
A Comprehensive Comparison Between Kubeflow and SageMakerHenrik Skogström / November 16, 2021
Kubeflow and SageMaker have emerged as the two most popular end-to-end MLOps platforms. Kubeflow is the first entrant on the open-source side, and SageMaker has a robust ecosystem through AWS. In this article, we will compare the differences and similarities between these two platforms.
A Comprehensive Comparison Between Kubeflow and DatabricksHenrik Skogström / November 09, 2021
Databricks is a unified data analytics platform, while Kubeflow is an MLOps platform. In this article, we will look at how they are comparable and how they are very different.
From Notebook to Production: How to Bridge the Gap between Data Science and Engineering?Eikku Koponen / November 08, 2021
When it comes to the production phase, actually providing the model to end-users and integrating it to the (existing) tools, Data Scientist often pass the baton to Software engineers. That handover is often quite rocky. Here are a few tips to how the bridge the gap between data science and engineering.
A Comprehensive Comparison Between Kubeflow and AirflowHenrik Skogström / November 02, 2021
Kubeflow and Airflow can both be used to orchestrate ML workflows. Airflow is the tool of choice for most engineers but this article will show what else MLOps platforms like Kubeflow can offer on top of DAGs.
An End-to-End Pipeline with Hugging Face transformersEikku Koponen / November 01, 2021
This article shows an example of a pipeline that uses Hugging Face transformers (DistilBERT) to predict the shark species based on injury descriptions. With Valohai, you can easily tie together typical data science workflows into repeatable pipelines.
Machine learning lifecycle doesn’t end with the modelHenrik Skogström / October 28, 2021
Let me preface this article by saying there isn’t a single accepted definition of a machine learning lifecycle. Most articles about the machine learning lifecycle tend to focus only on a small portion of the actual lifecycle: the Experimentation loop.
Product Update: Debugging and MetadataJuha Kiili / October 11, 2021
For the October product update, we chose to highlight a new feature, Remote Access Debugger, and some major improvements that we've shipped to the Metadata View.
No-code AI and MLOps: No-code AI is only no-code for the end userOtso Rasimus / October 04, 2021
No-code is only no-code for the end user, and that is also true for no-code AI. These platforms rely on the ingenuity of developers to abstract away the technical parts. MLOps is vital to deliver the product reliably and without risk.
DLOps: MLOps for Deep LearningHenrik Skogström / September 03, 2021
DLOps, deep learning operations, is an evolution of MLOps, looking to answer the unique operational challenges that deep learning sets. A skeptic may look at it as unnecessarily muddying the waters with a new buzzword.
Product Update: Spark as a First-Class CitizenJuha Kiili / August 31, 2021
Support for Spark has been one of the most requested features as Spark has become almost ubiquitous for data scientists and engineers working with structured data. We’ve heard the calls and Valohai now supports Spark natively.
Three ways to install ValohaiHenrik Skogström / August 26, 2021
One of the unique aspects of Valohai is that despite being a proprietary platform it can run in fully private, even airgapped, environments. Why is this important? Machine learning often revolves around data that is sensitive and thus data security is a fundamental requirement.
What is the AIIA blueprint and how does Valohai fit into it?Henrik Skogström / August 24, 2021
The AIIA blueprint is an excellent starting resource for teams looking to implement their stack for machine learning development. The initiative draws inspiration from other popularized tech stacks.
A Comprehensive Comparison Between Kubeflow and MLflowHenrik Skogström / August 11, 2021
As a data scientist or a machine learning engineer, you have probably heard about Kubeflow and MLflow. They are often compared against each other despite being quite different.
Product Update: Datum ImprovementsJuha Kiili / July 07, 2021
Datum is a version-controlled file inside the Valohai platform. Every datum is immutable by design. We have introduced three new improvements for more flexibility over datums.
Data Augmentation Helps Improve Model AccuracyHenrik Skogström / July 05, 2021
Putting together a suitable dataset for training a model can be one of the biggest challenges. Data augmentation is an approach where you start with an existing dataset and expand it to have more variety.
Model Interpretability in a NutshellEero Laaksonen / June 23, 2021
Let’s start by defining interpretability in the context of machine learning and AI. In simple terms, it means how easily a human can interpret how the model arrived at a decision.
The Best Machine Learning PodcastsHenrik Skogström / June 15, 2021
Summer is here and hopefully, for most of us, it means time to decompress. But if you are like me and learning is relaxing, podcasts are a great way to enjoy the summer weather while learning.
Building a YOLOv3 pipeline with Valohai and Superb AIJuha Kiili / June 08, 2021
This article shows an example of a pipeline that integrates Valohai and Superb AI to train a computer vision model using pre-trained weights and transfer learning. For the model, we are using YOLOv3, which is built for real-time object detection.
Product Update: Kubernetes, Spot Instances & Python Utility LibraryJuha Kiili / May 31, 2021
It's time for an update on what's been happening under the hood of the Valohai platform. We'd like to highlight three major features we've added in the past two months: Support for Kubernetes and Spot instances and the Valohai Python utility library.
Building a solution based on Machine LearningXavier Moles Lopez / May 26, 2021
Why a Machine Learning model is not a product if there is no MLOps. Our approach to implementing training as a reproducible process, and how this process intertwines with our CI/CD pipeline.
The Three Roles in a Machine Learning Team (and Two Technologies to Connect Them)Henrik Skogström / May 04, 2021
It's becoming more important to think about the competencies of a team rather than expecting every individual to be an expert at everything related to machine learning.
Superb Meets Valohai: An End-to-End Solution for Developing Computer Vision ApplicationsSuperb AI and Valohai / March 31, 2021
Computer vision is one of the most disruptive technologies of the recent decade. To develop computer vision systems requires massive, upfront investments. Or it used to, before Superb met Valohai.
MLOps for AI ConsultanciesHenrik Skogström / March 25, 2021
How can MLOps make consultant-client relationships more productive? Starting with machine learning is a massive, strategic undertaking, and many are turning to consultancies and contractors to take the first steps with AI.
If You Missed the MLOps WebinarHenrik Skogström / March 18, 2021
March 16th, we held a webinar to follow up on our MLOps eBook. Together with our co-authors, we wanted to tackle the goal we set for MLOps in the eBook: “The goal of MLOps is to reduce technical friction to get the model from an idea into production in the shortest possible time to market with as little risk as possible.”
Product Update: End-to-End AutomationJuha Kiili / February 03, 2021
In the past few months, we've rolled out three new features that highlight end-to-end automation on our platform: Deployment nodes in pipelines, Pipeline scheduler & Model monitoring.
What Is The Difference Between DevOps And MLOps?Henrik Skogström / December 21, 2020
If you are involved with production machine learning in any way, understanding MLOps is essential. For people with software development experience, the easiest way to understand MLOps is to draw a parallel between it and DevOps.
How We Trained 277M Models for the Black-Box Optimization ChallengeJuha Kiili / December 15, 2020
Valohai MLOps platform provided the infrastructure for the Black-Box Optimization Challenge for the NeurIPS 2020 conference. The competition was organized together with Twitter, Facebook, SigOpt, ChaLearn, and 4paradigm.
The Bus Factor in Machine Learning developmentHenrik Skogström / December 10, 2020
The bus factor is a common term in software engineering describing the risk of a key contributor disappearing unexpectedly from a project – because they get hit by a bus. In machine learning the bus factor is magnified significantly.
When Should a Machine Learning Model Be Retrained?Henrik Skogström / November 30, 2020
Should a machine learning model be retrained each time new observations are available (or otherwise very frequently)? The answer is “it depends”, but this article looks at two components to consider: the use case and the costs.
The Easiest Way to Become a Valohai UserHenrik Skogström / November 24, 2020
Buying an MLOps platform is tricky and for that reason we’ve introduced a model where teams can sign up for a two-week proof-of-concept project to test out our platform with their environment and projects.
When Is a Machine Learning Model Good Enough for Production, and How to Stress About It Only Once?Henrik Skogström / November 17, 2020
As you start incorporating machine learning models into your end-user applications, the question comes up: “When is the model good enough to deploy?” There simply is no single right answer.
A Machine Learning Pipeline Creates a Shared LanguageEero Laaksonen / November 04, 2020
Modern tooling and shared work methods (CI/CD, version control, microservices) have enabled companies to scale their throughput in software development exponentially. A machine learning pipeline brings similar scale to machine learning.
The MLOps StackHenrik Skogström / October 26, 2020
To make it easier to consider what tools your organization could use to adopt MLOps, we’ve made a simple template that breaks down a machine learning workflow into components.
Risk Management in Machine LearningHenrik Skogström / October 21, 2020
Machine learning and artificial intelligence allow businesses to gain new insights and improve their business processes. However, they expose companies to additional risks because humans do not explicitly program the algorithms. Let's look at some of these risks and how data scientists and compliance officers can help mitigate them.
5 Signs You Might Be in Need of an MLOps PlatformHenrik Skogström / October 09, 2020
Using the MLOps platform allows you to manage everything about machine learning in production, where each new update doesn’t feel like an entirely new project and easily dovetails to the last.
Why MLOps Is Vital To Your Development TeamHenrik Skogström / September 23, 2020
To make an analogy to a more traditional industry, machine learning is shipping goods while MLOps is containerization. And much like containerization of global shipping, MLOps is equal parts process and infrastructure.
Introducing Minihai – Easiest way to run notebooks remotely.Henrik Skogström / September 15, 2020
Imagine this; you are working on a notebook that takes ages to run, and it bogs down your computer, so it's even hard to multitask. We've seen this countless times, which is why we are introducing Minihai.
Why Are ML Engineers Becoming So Sought After?Henrik Skogström / September 07, 2020
For a long time, most machine learning initiatives have been stuck in a persistent state of proofs-of-concept. However, in the past year, we’ve seen a rapid acceleration of machine learning models getting real-world use. Consequently, machine learning engineers are increasingly sought after – nearly catching up to data scientists in posted jobs.
Valohai Joins Forces with Twitter and FacebookJoanna Purosto / July 28, 2020
Valohai, the MLOps platform company, is collaborating with Twitter and Facebook to launch a competition for the annual The Neural Information Processing Systems (NeurIPS) conference to advance the optimization of machine learning models towards more accurate AI solutions. The goal is to find better optimization algorithms for machine learning.
Why Levity adopted Valohai instead of hiring their first MLOps engineerDrazen Dodik / July 15, 2020
Levity decided to adopt Valohai to manage their machine learning infrastructure & model serving through Kubernetes instead of hiring an MLOps engineer.
What did I Learn about CI/CD for Machine LearningAri Bajo / June 09, 2020
Most software development teams have adopted continuous integration and delivery (CI/CD) to iterate faster. However, a machine learning model depends not only on the code but also the data and hyperparameters. Releasing a new machine learning model in production is more complex than traditional software development.
Bayesian Hyperparameter Optimization with ValohaiMagdalena Stenius / April 01, 2020
Grid search and random search are the most well-known in hyperparameter tuning. They are also both first-class citizens inside the Valohai platform. You define your search space, hit go, and Valohai will start all your machines. It does a search over the designated area of parameters you’ve defined. It is all automatic and doesn’t make you launch or shut down machines by hand. Also, you don't accidentally leave machines running costing you money. But we’ve been missing one central way for hyperparameter tuning, Bayesian optimization. Not anymore!
Classifying 4M Reddit posts in 4k subreddits: an end-to-end machine learning pipelineAri Bajo / March 31, 2020
Finding the right subreddit to submit your post can be tricky, especially for people new to Reddit. There are thousands of active subreddits with overlapping content. If it is no easy task for a human, I didn’t expect it to be easier for a machine. Currently, redditors can ask for suitable subreddits in a special subreddit: r/findareddit.
Machine Learning and Remote WorkEero Laaksonen / March 13, 2020
A lot of companies and teams are going fully remote for the first time due to the Coronavirus. We at Valohai are big believers in remote work. Having practiced with a distributed team for a good 4 years we would like to share some of our thoughts on remote work in Machine Learning. A lot of major pain points we have seen revolve around tooling.
Using DVC to version control your ML experiment dataFredrik Rönnlund / February 20, 2020
In this blog post we will explore how you can use DVC for your data version control and how you can automate your data version control with and without DVC inside the Valohai platform.
Machine Learning in the cloud vs on-premisesFredrik Rönnlund / February 05, 2020
It’s a running joke among developers that the cloud is just a word for somebody else’s computer. But the fact remains, that by leveraging the cloud you can reap benefits that you couldn’t achieve with your on-premises server farm.
Three ways to categorize machine learning platformsFredrik Rönnlund / January 30, 2020
Machine learning (ML) platforms take many forms and usually solve only one or a few parts of the ML problem space. So how do you make sense of the different platforms that all call themselves ML platforms?
Production Machine Learning Pipeline for Text Classification with fastTextAri Bajo / January 28, 2020
When doing machine learning in production, the choice of the model is just one of the many important criteria. Equally important are the definition of the problem, gathering high-quality data and the architecture of the machine learning pipeline.
Identify relevant text from complex documentsJoanna Purosto / January 13, 2020
Selko.io builds solutions for multi-disciplinary project teams working in large companies. These teams work according to project documents that usually have several hundreds of pages. Finding the relevant sections for each team member is a real burden in the project-based working environment.
Building vs. Buying ML infrastructure at Selko.ioSelko.io / December 09, 2019
This article is the story of us at Selko.io, productionizing our machine learning workflows. We'll describe Selko's route from starting the company to developing our first ML models. We'll also walk through how we built a fully working machine learning solution combining our UI, backend, and orchestration layer for machine learning tasks. And of course, how we went from a homegrown ML orchestration platform to Valohai. To give you some context, let's first dive into the history of the company.
Human Touch to AIKannan Sundar / November 26, 2019
One of the key challenges for a Data Science team is the search for an accurately labelled dataset for solving the given problem. While it is easy to build a basic model that is reasonably accurate for a demo to the business, going beyond it towards a production worthy solution needs gold standard ground truth data.
Scaling Apache Airflow for Machine Learning WorkflowsAri Bajo / November 19, 2019
Apache Airflow is a popular platform to create, schedule and monitor workflows in Python. It has more than 15k stars on Github and it’s used by data engineers at companies like Twitter, Airbnb and Spotify.
Exploring NLP concepts using Apache OpenNLPMani Sarkar / November 15, 2019
After looking at a lot of Java/JVM based NLP libraries listed on Awesome AI/ML/DL I decided to pick the Apache OpenNLP library. One of the reasons comes from the fact another developer (who had a look at it previously) recommended it. Besides, it’s an Apache project, they have been great supporters of F/OSS Java projects for the last two decades or so. It also goes without saying that Apache OpenNLP is backed by the Apache 2.0 license.
Machine learning is a zero-sum gameFredrik Rönnlund / November 12, 2019
Only the companies that invest into machine learning today will exist 10 years from now. The ones that look to the sidelines will be eaten by their competition.
Continuous Integration in Automotive Machine Learning DevelopmentJoanna Purosto / November 04, 2019
Continuous Integration (CI) in software development is the process of testing that a change in one place doesn’t break something else. Continuous Delivery (CD), on the other hand, is an extension to CI where every change in the code is also deployed. Both are and have been core parts in the advancements of Extreme Programming, i.e. rapid small-batch development. This, on its hand, has been the main contributor to advancements in rapid software development.
Updates for Valohai Powered NotebooksJuha Kiili / October 31, 2019
Valohai is the enterprise-grade machine learning platform for data scientists that build custom models by hand. In addition to writing code with classic IDEs like PyCharm or VSCode, we also have native support for data scientists preferring to use Jupyter notebooks.
NLP with DL4J in Java, all from the command-lineMani Sarkar / October 28, 2019
We are all aware of Machine Learning tools and cloud services that work via the browser and give us an interface we can use to perform our day-to-day data analysis, model training, and evaluation, and other tasks to various degrees of efficiencies.
Building a data catalog for machine learningFredrik Rönnlund / October 14, 2019
They say data is the new gold. But without a data catalog, your data is just scattered around like random nuggets of gold in a desert full of rocks, pebbles and sand. Data catalogs help you keep track of the data you have but also, in the case of machine learning models, what data has affected which model. Data brings meaning to machine learning because unlike software, machine learning models are 90% data and 10% code.
Announcing Valohai PipelinesAarni Koskela / October 02, 2019
One of the more exciting things we have under development (or, should we say, in the pipeline) right now is our Pipeline system. Since our mission is to enable CI/CD style development for AI and machine learning, there's a logical next step up from just (well, "just" might be the understatement of the year here) running your code in a repeatable manner with Valohai.
Self-Driving with ValohaiJuha Kiili / September 10, 2019
One of the hottest areas of application for deep learning is undoubtedly self-driving cars. We’ll go through the problem space, discuss its intricacies and build a self-driving solution utilizing the Unity game engine, training a neural network on top of the Valohai platform. Regardless of the technologies used, you’ll get an understanding of the basics as well as the code to tweak for yourself.
Automatic Data Provenance for Your ML PipelineFredrik Rönnlund / September 06, 2019
We all understand the importance of reproducibility of machine learning experiments. And we all understand that the basis for reproducibility is tracking every experiment, either manually in a spreadsheet or automatically through a platform such as Valohai. What you can’t track what you’ve done it’s impossible to remember what you did last week, not to mention last year. This complexity is further multiplied with every new team member that joins your company.
How to do Deep Learning for Java on the Valohai Platform?Mani Sarkar / August 29, 2019
Patenting Artificial Intelligence – What's It Really About?Fredrik Rönnlund / August 26, 2019
Software patents raised a lot of hairs twenty years ago, mainly because while governments are slow to react to change, software evolves rapidly, and patents thus live on for too long in comparison to hardware. Let’s in this blog post take a look at how AI patents are similar and different from software patents and what challenges can be seen in AI patenting.
Effective Machine-Learning Workflows with Azure PipelinesRuksi / August 22, 2019
Production-grade machine-learning algorithms never come out perfect on the first try. They require the same approach to iteration and testing as any other software project. But validating machine-learning algorithms is particularly hard—harder than writing simple unit or integration tests. And iterating on machine-learning algorithms gets harder as the team contributing to it grows.
5 Interesting Things About AI and PatentingVadym Kublik / July 30, 2019
All over the world, patents are known as the best way to protect inventions. They provide inventors with a period of up to 20 years to use an exclusive, monopoly-like position in the commercial exploitation of their creations. It is the key for getting returns on the investments they made during the research and development of their new technological solutions.
Challenges in Building a Scalable AI BusinessEero Laaksonen / June 18, 2019
I see the quote “AI is the new electricity” thrown around in about every other blog post. I think there is truth in it, but I also think most people don’t go to the bottom of what it really means for their business. Let’s first define what we mean by AI: in this context, I’m referring to new advances in machine learning and deep learning.
A High-Performance Visual Search EngineJoanna Purosto / June 04, 2019
Nyris is developing a high-performance visual search engine that understands the content of an image. The visual search engine works as an easy-to-use API that companies can use to inject visual search as a part of their solution.
Valohai's Jupyter Notebook ExtensionJuha Kiili / May 28, 2019
Valohai is a deep learning platform that helps you execute on-demand experiments in the cloud with full version control. Jupyter Notebook is a popular IDE for the data scientist. It is especially suited for early data exploration and prototyping.
Asynchronous Workflows in Data ScienceJuha Kiili / May 28, 2019
Pointlessly staring at live logs and waiting for a miracle to happen is a huge time sink for data scientists everywhere. Instead, one should strive for an asynchronous workflow. In this article, we define asynchronous workflows, figure out some of the obstacles and finally guide you to a next article to look at a real-life example in action in Jupyter Notebooks.
From Zero to Hero with Valohai CLI, Part 2Juha Kiili / May 02, 2019
Valohai executions can be triggered directly from the CLI and let you roll up your sleeves and fine-tune your options a bit more hands-on than our web-based UI. In part one, I showed you how to install and get started with Valohai’s command-line interface (CLI). Now, it’s time to take a deeper dive and power up with features that’ll take your daily productivity to new heights.
Machine Learning Infrastructure Lessons from NetflixFredrik Rönnlund / April 25, 2019
Ville Tuulos, machine learning infrastructure architect, was the first to publicly dissect Netflix’s Machine Learning infrastructure at QCon in November 2018 in San Francisco. If you haven’t seen the talk yet, read the summary of his talk here! All the pictures used here, are from Ville's presentation.
Building Machine Learning Infrastructure at NetflixToni Perämäki / April 11, 2019
In our series of machine learning infrastructure blog posts, we recently featured Uber’s Michelangelo. Today we’re happy to be interviewing Ville Tuulos from Netflix. Ville is a machine learning infrastructure architect at Netflix’s Los Gatos, CA office.
From Zero to Hero with Valohai CLI, Part 1Juha Kiili / April 04, 2019
As new Valohai users get acquainted with the platform, many fall in love our web-based UI - and for good reason. Its responsive, intuitive and gets the job done with just a few clicks. But don’t be fooled into thinking that’s the end of the interface conversation. We know it takes different [key]strokes for different folks, so Valohai also includes a command-line interface (CLI) and the REST API.
Machine Learning at NVIDIA GTC 2019Fredrik Rönnlund / March 28, 2019
Last week we had the pleasure of joining our partner SwiftStack at our joint booth at the NVIDIA GTC 2019 conference in San Jose. GTC touts itself as the premier AI conference and it sure was.
TensorBoard + Valohai TutorialJuha Kiili / March 27, 2019
One of the core design paradigms of Valohai is technology agnosticism. Building on top of the file system and in our case Docker means that we support running very different kinds of applications, scripts, languages and frameworks on top of Valohai. This means most systems are Valohai-ready because of these common abstractions. The same is true for TensorBoard as well.
Build vs. Buy – A Scalable Machine Learning InfrastructureToni Perämäki / March 19, 2019
In this blog post we’ll look at which parts a machine learning platform consists of and compare building your own infrastructure from scratch to buying a ready-made service that does everything for you.
Automatic Version Control Meets Jupyter NotebooksJuha Kiili / March 13, 2019
Running a local notebook is great for early data exploration and model tinkering, there’s no doubt about it. But eventually you’ll outgrow it and want to scale up and train the model in the cloud with easy parallel executions, full version control and robust deployment. (Letting you reproduce your experiments and share them with team members at any time.)
Multi-Cloud Data & Infrastructure Solution for Machine LearningFredrik Rönnlund / March 12, 2019
SwiftStack and Valohai, in joint partnership, announce the world’s first peta-scale ML solution that covers everything from computation to data management in a multi-cloud environment. The solution provides a global namespace removing silos and enabling universal access to all your data in all your machine learning use-cases. It has built-in support for Azure, Google Cloud, AWS and SwiftStack.
EU/US Copyright Law and Implications on ML Training DataVadym Kublik / March 01, 2019
We may live in the era of “Big Data,” and yet the access to it is somewhat restricted; especially, when we talk about high-quality data. This blogpost will address the question of acquiring data for your Machine Learning projects from the perspective of EU and US copyright laws.
Reinforcement Learning Tutorial Part 3: Basic Deep Q-LearningJuha Kiili / February 27, 2019
In this third part, we will move our Q-learning approach from a Q-table to a deep neural net.
Michelangelo – Machine Learning Infrastructure at UberFredrik Rönnlund / February 18, 2019
When we founded Valohai two years ago, we were lucky to make friends with team leads for Uber’s Michelangelo machine learning platform. Michelangelo has been an inspiration in building Valohai for the other 99.999…% of companies that aren’t Uber but still need to speed up their machine learning through automation.
Kubeflow as Your Machine Learning InfrastructureFredrik Rönnlund / February 08, 2019
By now you’ve surely heard about Kubeflow, the machine learning platform based out of Google. Kubeflow basically connects TensorFlow’s ML model building with Kubernetes’ scalable infrastructure (thus the name Kube and Flow) so that you can concentrate on building your predictive model logic, without having to worry about the underlying infrastructure. At least in theory.
Reinforcement Learning Tutorial Part 2: Cloud Q-learningJuha Kiili / February 07, 2019
In this second part takes these examples, turns them into Python code and trains them in the cloud, using the Valohai deep learning management platform.
How to Effectively Grow Your Deep Learning Team and Why Version Control MattersEero Laaksonen / January 30, 2019
There’s only one way to grow your deep learning team effectively: by adding new people to it! (We were just as shocked as you are by this revelation!)
Reinforcement Learning Tutorial Part 1: Q-LearningJuha Kiili / January 24, 2019
This is the first part of a tutorial series about reinforcement learning. We will start with some theory and then move on to more practical things in the next part. During this series, you will not only learn how to train your model, but also what is the best workflow for training it in the cloud with full version control using the Valohai deep learning management platform.
Run Jupyter Notebook On Any Cloud ProviderJuha Kiili / December 19, 2018
This tutorial will demonstrate how to take a single cell in a local Jupyter Notebook and run it in the cloud, using the Valohai platform and its command-line client (CLI).
The Journey from Deep Learning Experimentation to Production-Ready Model BuildingJoanna Purosto / December 03, 2018
Since the rise of the deep learning revolution, springboarded by the Krizhevsky et al. 2012 ImageNet victory, people have thought that data, processing power and data scientists were the three key ingredients to building AI solutions. The companies with the largest datasets, the most GPUs to train neural networks on, and the smartest data scientists were going to dominate forever.
Random hyperparameter optimizationAarni Koskela / December 03, 2018
Valohai now supports random search for hyperparameter optimization (which we call the Tasks feature), which has been proven in the aptly named paper Random search for hyper-parameter optimization to be an efficient way to find “neighborhoods” of likely-to-be-optimal hyperparameter values, which can then be iterated further to find the really good values.
Watch the Webinar on Version Control in Machine LearningJoanna Purosto / November 27, 2018
Watch a recording of the webinar on version control in machine learning that was held on 22th of November 2018. During the webinar we discussed about the topics below and answered multiple questions addressed by the attendees.
PocketFlow with ValohaiJuha Kiili / November 21, 2018
PocketFlow is an open-source framework from Tencent to automatically compress and optimize deep learning models. Especially edge devices such as mobile phones or IoT devices can be very limited on computing resources so sacrificing a bit of model performance for a much smaller memory footprint and lower computational requirements is a smart tradeoff.
Microsoft's Cognitive Toolkit (CNTK) on ValohaiEero Laaksonen / November 21, 2018
Microsoft's Cognitive Toolkit or CNTK is an open source framework for building Deep Learning models. This relatively new framework has been gaining traction so we decided to make sure Valohai supports it well. One of the benefits over competing frameworks has been CNTK’s ground up support for multi-node, multi-GPU training, something that for instance TensorFlow has been struggling to tackle well. If you are doing work on really large datasets, you should maybe give it a try.
Synthetic Training Dataset with UnityRuksi / November 12, 2018
Synthetic data is artificially created information rather than recorded from real-world events. A simple example would be generating a user profile for John Doe rather than using an actual user profile. This way you can theoretically generate vast amounts of training data for deep learning models and with infinite possibilities.
GDPR and its Effects on Machine Learning Based DecisionsJoanna Purosto / October 30, 2018
You might have heard that every individual subject to automated decision making by machine learning models has a right to an explanation of the result. I bet you feel drops of sweat forming on your forehead when you receive an inquiry from a manager saying that he needs details about how a certain decision was made. If thinking about this scenario gives you chills, you are in the right place. Read further and learn how to tackle the transparency issue.
What to Store from a Machine Learning ExperimentEero Laaksonen / October 23, 2018
When meeting with teams that are working with machine learning today, there is one point above everything else that I try to teach. It is the importance of storing and versioning of machine learning experiments and especially how many things there actually are that need to be stored.
Machine Learning Orchestration for FreeFredrik Rönnlund / October 15, 2018
You know what really grinds my gears? When I have a deep learning model that I want to train and I have to SSH into my AWS instance, install all the drivers and libraries, run my code and then forget to shut down my machine! Once, I ended up forgetting one up over the weekend that cost my employer over $10 000!!!
New Release: Managing Your Experiments Just Got EasierRuksi / October 12, 2018
Recreating experiments inside Valohai could be a whole lot easier and we’ve heard your cries!
Building Trust in AI ApplicationsRuksi Laine / October 03, 2018
All of us have seen those fear mongering headlines about how artificial intelligence is going to steal our jobs and how we should be very careful with biased AI algorithms. Bias means that the algorithm favors certain groups of people or otherwise guides decisions towards an unfair outcome. Bias can mean giving a raise only to white male employees, increasing criminal risk factors of certain ethnic groups and filling your news feed only with topics and point of views that you are currently consuming – instead of giving a broad, balanced view of the world and educating you.
Valohai and Microsoft Join Forces in Deep Learning for EnterprisesEero Laaksonen / October 02, 2018
Valohai and Microsoft cross lightsabers in the battle for artificial intelligence, through Microsoft’s global ScaleUp Program.
Speeding up Deep Learning with PowerAIFredrik Rönnlund / October 01, 2018
Just lately we’ve been playing around with IBM PowerAI in order to ensure our customers can leverage it in large-scale on-premise training. PowerAI in itself is IBM’s solution for deep learning consisting of software and hardware to help you quickly train deep learning models. Today we’re happy to announce that Valohai fully supports PowerAI and our customers can start using it!
Two Years of Democratizing AIEero Laaksonen / September 28, 2018
Valohai is turning 2 years old in three weeks. The paperwork was done on October 16th, 2016. It’s been a thrilling ride so I’ll take this chance to write a few words about why we really started this company.
Data Scientists Are Rocket Surgeons Stuck With Stone Age Tools 📠Ruksi Laine / September 07, 2018
Whitesnake cover bands of the 2020s. Although both might be sporting the same hobo beards, Data Scientists are getting their work done with just sticks and stones as their tools while us Software Engineers have every tool in the universe.
Level Up Your Machine Learning Code from Notebook to ProductionAarni Koskela / August 24, 2018
Developing a machine learning model for a new project starts with certain common groundwork and exploration, to understand your data and figure out the approaches to try. A popular choice for this groundwork is Jupyter, an environment where you write Python code interactively.
The Importance of ReproducibilityAarni Koskela / July 11, 2018
Reproducibility and replicability are cornerstones of the scientific method. Every so often there’s a sensationalized news article about a new scientific study with astounding results (for instance, we’re looking forward to seeing what’s hot at ICML 2018.
Top 49 Machine Learning Platforms – The Whats and WhysRuksi / July 03, 2018
If machine learning is a team sport, like I so frequently hear, machine learning platforms must be the playing fields. And to up your machine learning game, you must have the proper environments to do it.
Urban Waterways: The Next Generation of Autonomous TransportationToni Perämäki / June 28, 2018
With the promise of relieving strain on the transport network in maritime cities using Artificial Intelligence and autonomous driving technology, Finnish software powerhouse Reaktor set to build a solution for future waterways.
Machine Learning Infrastructure Explained to Business PeopleJoanna Purosto / June 19, 2018
After spending two days at the AI Summit fair in London and having several conversations with people from different business backgrounds, I wanted to clarify why machine learning infrastructure is one of the biggest things to concentrate on when building production level machine learning models.
Machine Learning Researcher vs Engineers – What's the Difference?Eero Laaksonen / June 15, 2018
Today’s machine learning teams consist of people with different skill sets. There are a bunch of different roles that are needed, but today I am going to talk about the two key roles that I get asked about the most: machine learning researcher / data scientist vs. machine learning engineer.
Clothing Detection for Fashion RecommendationDenis Carnino / June 04, 2018
Smart recommendation in apps and websites is not an additional feature that differentiates top industries from others. Most users take for granted that they will be suggested products that they like. Collaborative filtering has been widely used to predict the interests of a user by collecting preference and tastes information from many users. It is often combined with content-based filtering, especially for tackling the cold-start problem.
Load Forecasting using Machine LearningAli Farooq / April 20, 2018
In the age of technology, conventional methods are being automated, and computers are taking over. Similarly, for energy distribution, smart grids are replacing traditional energy distribution grids which allow efficient distribution and demand-side management.
Data Scientists Change Nature Conservation with Deep LearningToni Perämäki / March 26, 2018
Jacques Marais used machine learning to scan Africa’s elephant population from aerial infrared and color images taken from a plane. The built models were trained first in 2015 with local GPU hardware in three weeks. When the models were retrained in 2017 with the Valohai platform the work was completed in three days while the detection accuracy increased from 56% up to 67% and the overdetection rate dropped dramatically.
Valohai Receives $1.8M to Accelerate Machine Learning AdoptionJoanna Purosto / March 26, 2018
Valohai, a machine learning (ML) platform-as-a-service company, has raised $1.8M in funding to help international companies accelerate machine learning development and scale their model deployment. The round was led by Nordic seed stage investment company Superhero Capital, with participation from Reaktor Ventures and Business Finland, the Finnish Funding Agency for Innovation.
What is Valohai?Ruksi / January 19, 2017
We at Valohai are building machine learning platform-as-a-service. Underneath this mouthful of a buzzword we are actually trying to solve a real world problem I've seen being tackled over and over in dozens of big and small organizations applying or researching machine learning.