Risk Management in Machine Learning
Henrik Skogström / October 21, 2020
Machine learning and artificial intelligence allow businesses to gain new insights and improve their business processes. However, they expose companies to additional risks because humans do not explicitly program the algorithms.
There are regulatory, reputational, and ethical risks involved, which set a high standard of minimum performance for machine learning in the real world.
Let's look at some of these risks and how data scientists and compliance officers can help mitigate them.
Machine Learning: What Are the Risks?
Machine learning is a type of artificial intelligence that uses computer-driven algorithms to learn from data to detect similar trends and patterns in the future. These insights help businesses make data-driven decisions, improve performance, and create a competitive advantage -- but if the proper framework is not in place, they expose themselves to various risks.
Since machine learning algorithms learn from the data it is trained on rather than following specific human inputs; they can sometimes operate as a black box.
In a black-box machine learning model, it is impossible to interpret how the algorithm generated its predictions and outputs. As a result, compliance requirements may arise when businesses are required to explain decisions like why a loan was extended -- or denied.
Since this technology is still growing and changing, there is no widely agreed-upon method to govern the risks, like there are in other areas of technology. A survey conducted by KPMG found that 92% of companies that participated questioned their data and machine learning programs' trustworthiness and were worried about how that could impact their reputation.
The risks that arise from failing to have appropriate governance measures in place occur in three main areas: input data, algorithm design, and the outputs.
The input data is vulnerable to risks because it can be affected by biases in the dataset used to train the machine learning model. Businesses run the risk that the information is outdated or irrelevant or was collected by inappropriate methods.
This poses ethical and legal risks in the data privacy space, as well as with regulations regarding fair and business practices. Poor data quality can also undermine the model fit and lead to sampling bias.
The algorithm design could also be flawed, and using incorrect logic or inconsistent assumptions can lead to companies making poor decisions. This issue can occur if the model is underfitted or overfitted.
If a model is underfitted, the algorithm does not understand the sample data and will not meet performance criteria. In other words, the algorithm cannot appropriately detect the underlying trends in the data.
Overfitting is the opposite problem, where the model is trained too specifically to the training dataset. An overfitted model will not be able to extract information from new data and will perform poorly since it will only work on the sample data.
There are additional risks involving the outputs of machine learning -- the predictions can be misinterpreted or misused in respect to the model's underlying assumptions.
Other risks can arise if the goal of the model is not properly aligned to the business problem, making it have fewer applications to real-world decisions. This can lead to the overstatement of the importance outputs which fail to account for all relevant information -- causing unintended consequences like lack of fairness.
Similarly, machine learning relies on highly dimensional data, which can lead to unexpected predictions. This makes identifying and assessing risks even more difficult.
Machine learning is already being used by businesses worldwide to analyze data and discover trends that can help them make business decisions automatically.
For instance, banks can use a machine learning algorithm to determine whether a client would be a good candidate for a loan. The model will use the information it learned to determine if a customer is likely to default on the loan -- leading the bank to deny their credit application.
While this could improve loan application processing efficiency and accuracy, it can also lead to biased and unfair decisions if there is no governance framework in place. Nondiscriminatory regulations require that all consumers be explained these decisions, so banks must be able to prove that the outputs from the algorithm are accurate and fair.
In 2016, algorithms were blamed during the Brexit referendum for the crash of the British pound. It dropped by 6% in just two minutes, and this scenario is an example of what can occur without proper machine learning governance.
In the healthcare space, machine learning has been used to help improve diagnostics and other medical practices. The models must be interpretable in this case, as it can be a matter of life or death.
Recent investigations have found that certain algorithms utilized in the US criminal justice system, such as those to predict recidivism rates, can be racially biased. Again, controls regarding input data are essential to minimizing these risks.
How Data Scientists and Compliance Officers Can Help
Incorporating data scientists and compliance officers into your organization can help improve your ability to manage the risks associated with machine learning.
MLOps, or machine learning operations, is the process and technology that allows machine learning to work successfully within the business. It provides governance over the entire machine learning lifecycle so that access controls, audit trails, and prediction explanations can be incorporated into the standard practice.
Clearly defined roles and responsibilities are an essential part of this process.
Data scientists help govern the input data and algorithm design risk areas. They are responsible for creating and maintaining models and are the first line of defense to prevent biased or inappropriate data from being used to train the model.
The second line of defense is your compliance officers, who will manage periodic auditing, validation, and legal review after the machine learning model has been deployed.
They are responsible for implementing AI ethics and a code of conduct and developing training around this subject. Compliance officers will use audit trails and data analytics to monitor the machine learning model's outcomes and determine whether it meets performance and business requirements.
Utilizing data scientists and compliance officers allows your business to take advantage of the insights gained by machine learning while properly managing risks to meet ethical, legal, and regulatory compliance requirements.