Smart machines and systems are becoming mainstream, from self driving cars to Google Home and Alexa, but one place where our society as a whole can benefit is in the protection of our children. Two Hat Security builds and sells a system for automatically detecting sexual abuse material in image material in darknet or other hard to reach parts of the internet. The machine learning algorithms have been developed in collaboration with law enforcement and leading Canadian universities.
To build these artificial neural networks, Two Hat Security has been using Valohai to speed up training and retraining of the models and help them concentrate on the work at hand: saving children instead of configuring servers.
Machine Learning Infrastructure
The project started as a five person team using Amazon’s G2 instances with a company internal IT person that set up all machines for them. After a while the team started looking for alternative solutions. Given that early on in the project, most of the team members were interns, managing knowledge within the team also was an issue they wanted to solve. Managing login accounts and changing them after 6 months of internship became a maintenance hell.
When the team heard about Valohai they were in nirvana. Valohai not only managed the resources for them and could launch whichever hardware in the cloud they required, as well as their own hardware, but it also provided the team with easy access management to the training hardware.
The Deep Learning Model
As the typical training dataset is about 30 GB, and the largest about double that, the models are quite complex to train. The models are mostly about object detection and classification of illegal material. To make the data smaller there is naturally some data preprocessing where for instance the original data is stored in NumPy arrays to reduce size.
"For someone like me, with an engineering background and familiarity with Docker images, it was very easy to just jump into Valohai. We just configured our testing environment in Docker images and then configured the tests themselves in the Valohai YAML file, imported the project and boom! We had 30 hyperparameter sweeps on our first try."
David Wang – Data Scientist, Two Hat Security
Mostly the models are trained with PyTorch but there is also some TensorFlow involved. The work is not limited to computer vision; decision tree algorithms are also used, in which cases the team has used scikit-learn, xgboost and LightGBM.
Machine Learning Tools
Besides the frameworks and Valohai’s orchestration mentioned above the team is also using Amazon’s Sagemaker to do data exploration with. Sagemaker has limitations on the amount of data that can be stored in it (5 GB) and when the team shut down their Sagemaker instances and re-launched them, many things always have to be re-configured. The notebook approach is nevertheless used heavily in early phases of new models for exploring the data and when the exact model has to be trained it is moved into Valohai.
From Two Hat Security’s point of view the largest reason for going with Valohai was that it manages resources elastically – both in terms of hardware resources as well as in team accounts.
Valohai is a super stable environment for using computing resources and thanks to it none of us need to compete about resources internally anymore. Everything is in isolation, so I can even do some rapid testing and Valohai just shuts down the cloud instance when my test ends.
– David Wang, Data Scientist, Two Hat Security
The team previously used Tensorboard to view the progress of each training model, but now fully rely on Valohai’s online interface that shows the progression of for instance tens of parallel hyperparameter sweeps.
Lastly I just want to thank the Valohai team for all the support they’ve given us. We’ve had weekly calls where we’ve gotten first class support directly from the guys and with new features added quickly after the call. We’ve really felt like first class citizens.
– David Wang, Data Scientist, Two Hat Security
Valohai is a deep learning platform; that brings scalable infrastructure, version control and team management to data science. Distributed learning on multiple GPUs is time-consuming and not core competence for data scientists – Valohai scales your model to hundreds of CPUs, GPUs and TPUs at the click of a button. Reproducing models is time-consuming and cumbersome – Valohai automatically keeps track of all training data, hyperparameters, training algorithms, training environments and team members. Sharing models across changing team members and teaching best practices requires rigorous work – Valohai builds on best practices and shares them across teams.