Machine learning is beginning to change our world. More and more industries are integrating machine learning into processes specific to their businesses. This has, in some cases, resulted in 90% improvements over previous methods. Human decision making can be augmented or even completely replaced by machine learning. This technology can look through a company’s entire financial history and run through hundreds of thousands of simulated choices and outcomes in order to find the most optimized solution to a problem. This opens the door to machine intelligence one day matching and exceeding human intelligence.
Machine learning obviously has huge potential but it was created by humans which means that it also comes with all of the bugs and issues that any other application might. Because the algorithms underlying machine learning often augment or attempt to replace human decision making they can mimic our own biases or they can be created in such a way that it makes our existing problems even more en-grained. The potential for harm is increased as the technology is used in situations with serious consequences and when the machine’s decision is not transparent to the people affected.
There is nothing inherently wrong with using machine learning in any of these areas if the algorithms are better than humans or help to augment our decisions in a fair way. But just as humans are infallible so are the systems they create. Eric Holder, the US Attorney General in 2014, spoke about this growing concern stating, “although these measures were crafted with the best of intentions, I am concerned that they inadvertently undermine our efforts to ensure individualized and equal justice.”. He went on to say, “They may exacerbate unwarranted and unjust disparities that are already far too common in our criminal justice system and in our society.”
The main areas of concern are:
- The data going into the systems
- The factors within the data set that are used by the algorithm
- A lack of transparency within the system
- A lack of understanding by decision makers of how to weigh the results
Machine learning is only as good as the data it is given. If the data being fed into the system is old and no longer relevant, is incomplete and does not represent all groups, or is simply incorrect, then the results will be inaccurate and probably unfair.
For a very simple example, a car insurance company wants to predict accident rates. Machine learning finds patterns in the data and sees that aggressive drivers are more likely to get in wrecks and are more likely to drive red cars. A predictive model is created to give people in red cars slightly higher risk scores which translates to higher insurance prices. What happens if a certain group of people is also more likely to drive red cars, but they are no more likely to be aggressive drivers or to wreck than any other group? Using red cars to predict accident rates could unfairly target a certain group more than another even though both groups have the same rates of accidents.
If a machine learning algorithm decides you are ineligible for a home loan, you may want to know how it came to that conclusion so you can dispute the findings or improve your results. Currently, machine learning has a very difficult time showing its work and explaining how it came to a certain decision. When an algorithm has done millions of calculations and simulations to arrive at that result there is not currently an easy way for it to explain that process to a human. This results in very little transparency and ultimately a black box that you cannot see inside. Companies may also have a hard time revealing details about the inputs and inner workings of their technology as there could be closely guarded intellectual property as risk.
There are solutions to most, if not all, of these problems. The main challenge is identifying the problems in the first place and putting in the work to implement a proper solution. There are no magic bullets. Each solution can improve the situation but does not completely safeguard from all problems. A proactive approach to fixing problems is always preferable to having to fix a system or take it offline after the fact.
One new solution to the problem comes with the EU’s General Data Protection Regulation (GDPR) which has been in the news recently. This set of rules “protects users from having decisions made upon them based only on automated means or have the right to contest automated results that may be a factor of the decision.” In practice this allows people to have a human double check the machine’s work or appeal its findings in most cases.
In addition to regulatory efforts statistical validation testing can be performed on machine learning that is used to make predictions. Validation methods such as Monte-Carlo Cross Validation and k-Fold Cross Validation can test the accuracy of models that make predictions to ensure they predict successfully over a specific threshold.
In the area of transparency DARPA has been working on their Explainable AI project. This program attempts produce more explainable machine learning models with comparable performance to existing machine learning and AI system. If successful, it would also help to contextualize the results of a model by showing its strengths and weaknesses so decision makers are able to more appropriately weigh the machines results against other factors they are taking into consideration.
Machine learning has the potential to remove or at least counter act our own biases and bad decision-making skills. Until that point is reached, we should appreciate the potential problems and work to take proactive steps to solve them.