The principles of artificial neural networks have been around since the 1950s, but it’s only in the last decade they have truly revolutionized the world of AI thanks to the availability of huge datasets and computing power.
Incidentally, the last decade has also seen significant growth in cyber threats, including large-scale organized groups and state-sponsored actors with considerable financial and technical resources. Neural networks are sometimes a valuable target for attackers, given their growing use in critical applications and an inherent flaw in their design allowing an exploit called Adversarial Attacks.
Hacking neural networks
A neural network is a complex system that takes an array of numbers as an input and computes another array from it. The nature of the input/output vectors depends on the task. Here are a few examples :
“Does this picture contain a cat?”
A network answering this question takes an image (2D array of pixel values) as its input and outputs a single number between 0% and 100% that represents the estimated probability that a cat appears in the image.
“What language is spoken in this audio?”
In this case, the neural network is fed a time-frequency spectrum array, and produces a vector of detection scores for a set of predetermined languages.
Artificial neural networks have an important property called differentiability, which means that a small modification of the input will only result in a slightly different output. Hackers can exploit this property by carefully applying many imperceptible modifications to their input which all push the network’s output in the same direction. When all these modifications are applied to the input data together, the network will produce an unexpected result.
Take the following two images :
As a human, you see absolutely no difference between the two, which are without a doubt a picture of a cat. However, some pixels in the image on the right have been altered by a very small amount (less than 0.5%), which is more than enough to fool a targeted neural network and have this innocent kitten be classified as a dangerous snake.
As you can imagine, adversarial attacks are a growing concern for makers of machine learning applications, where being targeted by a malicious actor can have harmful consequences, and mitigating the risks is hard since the property of differentiability is essential to the neural network.
Adversarial attacks in the wild
Several of these attacks have been successfully conducted on real-world systems. Thankfully, the attackers were well-intended researchers who reported the vulnerabilities to the manufacturer, and only performed the exploits as proofs of concept.
One major example was published in 2018 by Kevin Eykholt and Ivan Evtimov regarding automated recognition of road signs. With a 87.5% success rate, they managed to trick a state-of-the-art neural network into detecting a stop sign as a speed limit sign, by placing 4 stickers at carefully selected positions:
The future of adversarial attacks
The entire history of adversarial attacks is very recent, with the first research papers published in 2014. There have been efforts since 2017 to develop ways of defending against them, but each one induces additional development time and maintenance complexity for the person building the model. As security is often the first feature affected by a tight budget, we can expect most models to remain vulnerable until a universal and simple solution is found.
All in all, threats associated with neural networks will keep growing at an even faster pace than neural networks themselves, which have to be considered an essential security asset as early as the development stage in order to avoid costly incidents.