One of the things that are important to consider when you’re testing artificial intelligence is the possibility that you have unintentional bias in your training data. Because if there is, you’ll end up with a biased system…
So, how can this happen and what should you look for?
There is a very interesting YouTube poem by Joy Buolamwini which highlights the ways in which artificial intelligence can misinterpret the images of iconic black women: Oprah, Serena Williams, Michelle Obama, Sojourner Truth, Ida B. Wells, and Shirley Chisholm. You can find it here: https://youtu.be/QxuyfWoVV98
This is just one example where the results of an AI-system only are correct for a part of the population. In this example, the same system is very accurate when it comes to identifying white males. The problem is most likely that they have used training data where white males where over-represented and black women were not represented enough. (for more information about bias in facial recognition algorithms see Joy’s TED talk here: https://youtu.be/QxuyfWoVV98)
Another example is a system aimed for supporting the recruitment of new employees that deems it necessary for the candidate to be male if he’s going to work in the IT department. In this case, the gender shouldn’t be a criterion for hiring at all, but whenever a system learns from historic training data there is a possibility for it to discover a pattern that isn’t relevant (in this case males working in IT), correlation doesn’t mean there is causality.
In the case you’re involved when selecting the training data, you, as a tester, should examine the training data for unintentional bias. You also could change the training data to be more representative or mask data that shouldn’t influence the decision.
When testing these systems, we don’t always have access to the training data. Therefore, we need to test if there is bias in the system or if the wrong thing determines the result. To be able to do so, we need to guess what could get wrong.
If it’s an image recognizing system, we should use images of different types of people based on for example sex, looks, age, skincolor and disabilities. We should also use images that aren’t of people at all, for example, animals and objects. We also need to test if it’s something in the background that determines the identification.
If it’s a text-based system, we need to examine what data determines the result. For example, in the example of the recruitment system determining if we should hire a person, the gender, religion, political views or sexual orientation shouldn’t influence the decision.
So, if you’re testing artificial intelligence, think about what unintentional bias could be in the system and test to discover those problems and enable them to be fixed before people get to be treated badly because the AI-system drew a wrong conclusion.
This blog was written by Eva Holmquist (Sogeti Sweden) and Rik Marselis (Sogeti Netherlands).
Eva Holmquist is a senior test specialist at Sogeti. She has worked with activities from test planning to execution of tests. She has also worked with Test Process Improvement and Test Education. She is the author of the book “Praktisk mjukvarutestning” which is a book on Software Testing in Practice.
Rik Marselis is a test expert at Sogeti. He worked with many organizations and people to improve their testing practices and skills. Rik contributed to 19 books on quality and testing. His latest book is “Testing in the digital age; AI makes the difference” about testing OF and testing WITH intelligent machines.
About Rik Marselis
Rik Marselis is one of Sogeti’s most senior management consultants in the field of Digital Assurance and Testing. He has assisted many organizations in improving their IT-processes, in establishing their test approach, setting up their quality & test organization, and he acted as quality coach, test manager and quality supervisor. Also Rik is a much-appreciated keynote-speaker at conferences.
More on Rik Marselis.