Data Scientists Change Nature Conservation with Deep Learning
Toni Perämäki / March 26, 2018
Jacques Marais used machine learning to scan Africa’s elephant population from aerial infrared and color images taken from a plane. The built models were trained first in 2015 with local GPU hardware in three weeks. When the models were retrained in 2017 with the Valohai platform the work was completed in three days while the detection accuracy increased from 56% up to 67% and the overdetection rate dropped dramatically.
Where did it all began?
From the dawn of mankind, we have lived in harmony with nature and consumed only what we need to survive while preserving the delicate balance we have with our surroundings. Unfortunately, this situation has not existed for a long time and we have overused and depleted natural resources to a point where many species and habitats are on the brink of extinction.
When it comes to getting precise information about different ecosystems and species, one of the biggest problems is that information gathering is usually manual and slow as the habitats of different species are often spread over vast geographic areas and even across continents.
The African forest elephant
One of species under threat is the African elephant population due to the large demand for their tusks. According to WildAid, at least 65% of the African forest elephant population was poached between only 2002 and 2013. The precise percentage is unknown. The Great Elephant Census, a philanthropic effort of Paul G. Allen, aimed to provide precise data to support the concerns of elephant population decline. The GEC spent 3 years, 9 700 flight hours by 286 crew members with a $7 million budget to survey 18 countries and manually count elephants from the air. The project covered only 24% of the elephant ecosystem area and the total population was estimated from the sample. Due to the massive time requirements and cost associated to the project it is not possible to repeat the census on a regular basis.
At Faculty of Science of Stellenbosch University in South Africa, Jacques Marais has been looking into alternate approaches to solve the problem, namely using aerial infrared and color images and leveraging deep learning to identify elephants from the pictures. This approach would cut down the costs and time requirements of the census to a fragment of what is required with the manual approach. Marais had proposed the approach already earlier but only with the recent advancements in cloud computing and the associated price drop has the leveraging of deep learning become economically viable.
The goal of Marais’ study was to find out how we can use technology as a tool to minimize or even reverse the adverse impact of humanity to environment. This was to be done by establishing the feasibility of using computer vision and deep learning to build an affordable and usable system that could detect elephants from aerial infrared and RGB images, providing a reduced set of data to a human for verification.
Solution: An accurate deep learning model for elephant census purposes
During the study Marais set to build the detection algorithms using two types of pictures, infrared and color images. First, he set to build an algorithm which could use the infrared pictures to find regions of interest and thus reduce the amount of color images the following deep learning algorithm would need to go through to identify the elephants. The region identification model was able to reduce the proposed regions down from 55,507 to 1,875, removing 97% of the manual verification work and missing only 1% of the manually counted elephants.
During the study it became clear that the amount of needed data to train the models would make building the models from scratch impossible and therefore it was decided to use transfer learning to teach the models to identify the elephants. The solution was built using modern architecture namely leveraging Google’s InceptionV3 image recognition model as a base, cloud infrastructure, TensorFlow & Keras and the Valohai platform to manage the whole infrastructure and make it possible to train the models on large scale. “Being able to use a platform like Valohai enables efficient resource usage at a level unheard of even when running experiments at scale.”, writes Marais in his research.
The built models were originally trained in 2015 with only local GPU hardware. The training runs took 48 hours on average and each model was trained successively. When the models were retrained in 2017 on the Valohai platform, Marais employed nine AWS instances to do hyperparameter optimization on a 180-model batch run. The work, which earlier took three weeks to complete, was finalized in only three days while the project’s accuracy increased from 56% up to 67% while the overdetection rate dropped dramatically. In addition to accelerated training speed, Marais was able to use the Valohai platform to track changes and build reproducible algorithms. This is extremely important in a research context in order to make it possible for others to recreate and validate research results from the datasets.
Looking forward, a similar approach Marais was using to identify African forest elephants from infrared and color pictures could be used to detect other species and map ecosystems. With new initiatives like PlanetLabs, which envisions to map and capture every inch of the planet’s surface on a daily basis, this kind of Deep Learning approach could be used on satellite images, taking the cost of mapping down even further while reducing latency to near real-time.
These developments will enable us to use data to justify decisions and quantify problems. And maybe, in the end take us few steps closer to saving our endangered species and nature as whole.
Jacques Marais is a data scientist who studied Applied Mathematics at Stellenbosch University where he later worked and built the models for elephant detection as part of his thesis work. Currently Jacques is the Product Manager and Lead Data Scientist at peer-to-peer marketplace Zadaa.
Valohai was founded in 2016 by a team of engineers with backgrounds in various software development companies including San Francisco-based Leap Motion. With Valohai, research teams, and large enterprise ML teams can train their models on a massive scale and deploy them to production environment with few clicks. Valohai also enables collaboration, version control and reproducibility on projects and ensures changes in team composition do not hinder the experiment process. All this means researchers and companies alike don’t have to build vast overhear infrastructure before they can start applying deep learning, saving up to 2 years of time.