Tracking the carbon footprint of model training
Magdalena SteniusWhat started as a fun side project for our developer Magda turned out to be a proud addition to the platform. Valohai can now estimate the carbon emissions of cloud instances. Yay!
The initial idea behind the project stemmed from discussions with data scientists who asked if there was any way of tracking their carbon emissions as some companies want to be able to say that their projects are carbon neutral. Due to factors out of our control, these are estimates and not 100% precise, and there is much room for improvement. For example, the data on the carbon emissions for cloud compute is limited. Moreover, in some cases, companies can financially compensate for the emissions, which does not mean that they are directly carbon neutral. For example, Google Cloud does this.
However, even with the current limitations, in this article, we focus on how Valohai handles the estimate calculations with the available data and what you can do to reduce your carbon footprint when using the cloud.
How are the estimations calculated in Valohai?
Before we go into calculations themselves it's worth mentioning that when we started investigating the topic we were very impressed with the work done by the team from Teads Engineering with their carbon emissions calculations and datasets. Thus, we base our work on their previous work on the subject.
Moreover, we wanted to understand emissions in a holistic way, including indirect emissions from e.g. the manufacturing of servers used by cloud providers, which is what this Greenhouse Gas Protocol enables (hereinafter GHG Protocol). The final estimate is given in grams of CO₂ equivalent, while the calculations are done based on two out of three scopes of the GHG Protocol. Here is a short overview of the scopes included in the protocol:
Scope 1 emissions are direct emissions from owned or controlled sources.
Scope 2 emissions are indirect emissions from the generation of purchased energy.
Scope 3 emissions are all indirect emissions not included in scope 2 that occur in the value chain of the reporting company, including both upstream and downstream emissions.
Scheme 1, 2, 3 scope emissions Credit: Plan A based on GHG protocol
In the case of Valohai, it translates into the following categories:
Scope 1: emissions from the cloud instances. Since we are not in control of these, we do not take them into account in our calculations.
Scope 2: purchased electricity, or power used by the computer running the job, i.e. (instance power consumption at given load * data center power usage efficiency (PUE) * electricity carbon intensity) * runtime. Power usage is based on the Teads Engineering dataset, which is also referenced in the Wattson readme. (Wattson is a Python library for estimating cloud compute carbon emissions. It currently supports emissions for the range of Amazon EC2 instances in a variety of regions.)
Scope 3: the amount of greenhouse gas that the hardware produces, i.e., the emission from manufacturing the computers, or manufacturing emissions amortized over the machine's estimated 4-year lifetime * runtime. Again, since we are not directly in control of the actual lifetime of the hardware in use, we are using the average estimation for a computer life that is about 4 years.
Thus, the emission estimation equation that we use is as follows:
Carbon footprint (gCO₂ eq) =
= (Scope 2 emissions (gCO₂ eq) + Scope 3 emissions (gCO₂ eq) ) / 4 years
Currently, the Valohai platform supports estimations for the AWS instances and the estimations for their Scope 3 emissions. Azure and GCP support is still a work in progress. As it was mentioned above, one can consider the GCP to be carbon neutral. However, bear in mind that it only means that they compensate for the emissions they cause.
How to minimize the carbon footprint of ML models?
For one, you should optimize the code. The efficiency of the code directly impacts the required power consumption. It becomes especially evident if you have run multiple trainings. (Not to mention optimized code is faster, so you pay less for your cloud compute too!)
Secondly, do use cloud computing. Contemporary hyper-scale modernized and climate-smart data centers are better in PUE terms than on-premises machines. For example, "An Uptime Institute study found that an average enterprise data center has a Power Usage Effectiveness (PUE) of 1.7 compared to cloud providers like AWS rated at 1.07--1.15, which is more than 71% better." (AWS)
Finally, you should choose regions where grid carbon intensity is low, meaning that the production of power causes fewer emissions due to, for example, having all of the elements in the chain located close enough to one another within one region, or choosing the region that uses green energy to power data centers. E.g., Nordics (0.0432 USD/hour) or East Asia (0.058 USD/hour). You can check the prices from here.
What's next?
If you want to learn more about the topic, you can check out these:
"Estimating AWS EC2 Instances Power Consumption" by Benjamin Davy on Medium;
Cloud Carbon Footprint project;
Teads implementation.
And if you are wondering why we are using open-source components, that's because we would like to contribute to the community effort of reducing, or at least, compensating the carbon emissions.
Valohai is not a carbon emission monitoring or reduction tool that would solve the issue. It is a managed MLOps platform. But we do care!
We discovered that the majority of our compute and storage are located in regions where emissions are in various ways carbon compensated both in GCP and AWS. Still, we could take it further in the future and add more monitoring, especially to find unused resources and improve detecting and alerting on underutilized infrastructure.