Synthetic Training Dataset with Unity
Ruksi / November 12, 2018
Synthetic data is artificially created information rather than recorded from real-world events. A simple example would be generating a user profile for John Doe rather than using an actual user profile. This way you can theoretically generate vast amounts of training data for deep learning models and with infinite possibilities.
Data can be fully or partially synthetic. If we generate images from a car 3D model driving in a 3D environment, it is entirely artificial. A partially synthetic counterpart of this example would be having photographs of locations and placing the car model in those images. However, with partial synthesis, you only get partial control, but this can be an acceptable tradeoff in many cases.
Synthetic datasets are frequently used to test systems, for example, generating a large pool of user profiles to run through a predictive solution for validation. However, this fabricated data has even more effective use as training data in various machine learning use-cases.
Synthetic data is awesome
Manufactured datasets have various benefits in the context of deep learning.
When you complete the generation process once, it is generally fast and cheap to produce as much data as needed . But notice that some datasets such as photo-realistic video can take vastly more processing power than other datasets.
On top of seemingly infinite amounts of training, you get perfectly accurate labels for learning. Hear that? No labeling needed. You can save the raw 3D coordinates of each object in the image if you wish, but bounding boxes or pixel areas are likely to be more useful.
Properly created synthetic data can unshackle you from most privacy and regulatory concerns when working with sensitive datasets. You just need to be a bit careful e.g. that the fabricated personal data doesn't have recognizable connection to an actual individual.
You can get various representations of a specific image/scene with one pass. You can simulate cameras, LIDAR and other data sources. Simulate the hardware your real world use-case utilizes; engine temperature, fuel amount, and so on. You can even try out if LIDAR data would improve performance in your case before investing in any equipment, at least to some extent.
Up to this point we have been talking about things that are easier to accomplish with fabricated data but how about dangerous or impossible to acquire data ? Let's say you want to detect car crashes on public roads. There is only limited amount of material you can gather, and it is hard to get more. It might take a week or a month to generate realistic enough crashes in a 3D engine, but after that: unlimited training data.
But synthetic data isn't for all deep learning projects
The main challenge of fabricated datasets is getting it to close enough similarity with the real-world use-case; especially video. It might help to reduce resolution or quality levels to match the quality of the cameras and so on, depending on your use-case. Real blockers arise when predictions focus on small details such as facial recognition.
Also, like mentioned above, rendering long photo-realistic videos can be extremely resource heavy . You can get around that by only drawing the segmentation maps, especially if you are converting the raw footage to segmentation maps in any case.
And finally, you might just be missing the resources to researching and building data manufacturing pipeline in the first place, even if it would save time in the long run.
Semantic segmentation map of a street with cars and pedestrians.
Image generation with Unity
If you want to synthesize images or video data, you should check out game engines like Unity or Unreal. Game engines are built to create virtual environments after all. If you are unsure what to use, I'd recommend Unity as it is the most straightforward, but you can switch to Unreal if you need more photo-realistic results.
For the rest of the post, I'll be going over how to run Unity applications on Valohai deep learning management platform.
First things first, we need a Unity application that renders and saves images. For the sake of example, I'll render a rotating car in front of a brick wall. In an actual use-case, you will most likely want to save car screen coordinates and such from the scene, but I'll keep this example simple.
The Unity application in action, taking screenshots of a car in various angles and lighting conditions.
Nothing too fancy going on here, the whole application is less than 150 lines of code. You have a scene you want to use for training and call
ScreenCapture.CaptureScreenshot() . You can find the full application at https://github.com/valohai/Imgen
After you have compiled the application for 64-bit Linux with Unity, you can use the executable on Valohai. We've hosted the binary on S3, but you could also build the whole Unity application on the platform, but we'll skip that for brevity.
You can find the whole Valohai enabled project with Unity execution instructions and mockup Python script at https://github.com/valohai/unity-example .
Next, we'll use the standalone Unity application to save some images and redirect those images to another execution which loads and uses the images.
We run the application with
./imgen.x86_64 -logfile and move the outputs to
/valohai/outputs as defined in the
The final piece is to run the mock Python training. In our example, we provide the generated dataset, it just prints the number of images received and outputs semi-random training values, but the script could do any Python operations such as running Keras, TensorFlow or PyTorch.
A glimpse how the training looks on Valohai platform.
That is a high-level overview of how to run Unity applications on Valohai and tie them to a data science pipeline. You could efficiently run multiple image generators or training in parallel with a click of a few buttons.
If this brief introduction didn't quench your thirst, be sure to book a demo to see it in action and talk more about the inner workings ;)