How Common are Adversarial Attacks on AI?

AI cybersecurity vulnerabilities with a focus on adversarial machine learning attacks

• Adversarial Machine Learning was studied as early as 2004, when it was regarded an interesting peculiarity rather than a security threat. Recently, in the light of increasing security concerns about AI systems, there has been a renewed interest in AML security

• “30% of all AI cyberattacks will leverage training-data poisoning, AI model theft, or adversarial samples to attack AI-powered systems.” - Gartner

• “25 out of 28 organizations struggle to find the right tools to secure their ML systems.” – Microsoft Survey

• “2 in 5 organizations had an AI privacy breach or security incident” – Gartner Blog

Machine learning (ML) is making incredible transformations in critical areas such as finance, healthcare, and defense, impacting nearly every aspect of our lives. Many businesses, eager to capitalize on advancements in ML, have not scrutinized the security of their ML systems. However, with great power comes great responsibility, and AI is no exception. Adversarial Machine Learning (AML) is a type of attack that exploits the vulnerabilities of AI models (to learn more about AI attacks read our blog: What are AI attacks?). Adversarial attacks on AI are designed to trick the AI model into making incorrect predictions (to learn more on why AI systems can be attacked, read our blog: Why do AI attacks exist?).

In 2016, Microsoft released its AI-enabled chatbot Tay publicly to social media and retrained it based on inputs from its conversations with users. (Microsoft Takes AI Bot ‘Tay’ Offline After Offensive Remarks - Bloomberg). Shortly after the release, internet trolls launched a coordinated data-poisoning attack that abused Tay’s learning mechanisms, enabling the attackers to retrain it to tweet inappropriate content. This AI system’s integrity was compromised by bad actors, through data poisoning—introducing malicious inputs to manipulate the model’s outputs. Machine learning (ML) models trained on open-source data or production data can be especially vulnerable to this type of attack. As AI/ML solutions proliferate, the attacks on such systems also multiply. Some real-world examples include cybersecurity breaches, privacy attacks on patient records, and intellectual property theft. In this article, we will mainly focus on how common these AI attacks are and some examples.

How common are adversarial attacks on AI?

Adversarial machine learning was studied as early as 2004 . But at the time, it was regarded as an interesting peculiarity rather than a security threat. However, the rise of deep learning and its integration into many applications in recent years has renewed interest in adversarial machine learning.

During the last four years, Microsoft has seen a notable increase in attacks on commercial ML systems. Market reports are also bringing attention to this problem: Gartner predicts that “30% of all AI cyberattacks will leverage training-data poisoning, AI model theft, or adversarial samples to attack AI-powered systems.” Despite these compelling reasons to secure ML systems, Microsoft’s survey spanning 28 businesses found that most industry practitioners have yet to come to terms with adversarial machine learning. Twenty-five out of the 28 businesses indicated that they don’t have the right tools in place to secure their ML systems. What’s more, they are explicitly looking for guidance. We found that preparation is not just limited to smaller organizations. We spoke to Fortune 500 companies, governments, non-profits, and small and mid-sized organizations.”

“The survey pointed to marked cognitive dissonance especially among security analysts who generally believe that risk to ML systems is a futuristic concern. This is a problem because cyber-attacks on ML systems are now on the uptick. For instance, in 2020 we saw the first CVE for an ML component in a commercial system and SEI/CERT issued the first vuln note bringing to attention how many of the current ML systems can be subjected to arbitrary misclassification attacks assaulting the confidentiality, integrity, and availability of ML systems. The academic community has been sounding the alarm since 2004, and have routinely shown that ML systems, if not mindfully secured, can be compromised.”, reported in the Microsoft Security Blog.

Adversarial attacks on AI are becoming increasingly common. This is due in part to the fact that many AI models are trained on large datasets that may contain biased or incorrect information . Additionally, many AI models are designed to optimize for accuracy rather than robustness, making them more susceptible to adversarial attacks. According to a recent report by OpenAI, a research organization focused on artificial intelligence, the number of adversarial attacks on machine learning models has been doubling every year. The report also found that adversarial attacks are most common in machine vision tasks, such as image classification and object detection.

Source: AI Models under Attack; Conventional Controls are not Enough: https://blogs.gartner.com/avivah-litan/2022/08/05/ai-models-under-attack-conventional-controls-are-not-enough/

There are few categories of adversarial attacks (Understanding Types of AI Attacks), one of them is evasion attack where attacker’s intent is to force the AI system to make a wrong prediction. Other adversarial attacks such as data poisoning, model inversion and model stealing, impact one or more of the three main tenets of security – confidentiality, availability, and integrity. One of the types of attack, Extraction attack, is a black box attack technique used to reverse-engineer AI/ML models or get insight into the data used to train them.

The NSA’s Ziring explained extraction attacks this way: “If you’re a government agency, you’ve put a lot of effort into training your model, perhaps you used highly sensitive data to train it … an attacker might attempt to query your model in a mathematically guided fashion to extract facts about the model, its behavior or the data that was used to train it. If the data used to train it was highly sensitive, proprietary, nonpublic, you don’t want that to happen.”

Reference: http://web.archive.org/web/20220422212213/https:/www.afcea.org/content/hacking-poses-risks-artificial-intelligence

Cybersecurity Awareness: Adversarial AI Attacks | Artificial Intelligence & Technology Office, Department of Energy

Real-world examples of adversarial attacks on AI

Adversarial attacks on AI are not just theoretical concepts. There have been several real-world examples of adversarial attacks on AI.

Tricking a self-driving car to misidentify a “STOP” sign

In 2019, researchers at the University of California were able to trick a self-driving car into misidentifying a stop sign as a speed limit sign by adding a small sticker to the sign. This could have serious implications if someone were to maliciously manipulate road signs to cause accidents.

You Can Trick Self-Driving Cars by Defacing Street Signs (bleepingcomputer.com) | Reference: [1707.08945] Robust Physical-World Attacks on Deep Learning Models (arxiv.org)

Facial recognition system fooled with adversarial patches

In another example, researchers from Carnegie Mellon University were able to fool a facial recognition system into misidentifying a person by wearing specially designed glasses that created "adversarial patches" on the face. This could have serious implications for security systems that rely on facial recognition.

Researchers wearing simulated pairs of fooling glasses, and the people the facial recognition system thought they were. Image by Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K. Reiter Reference: Magic AI: these are the optical illusions that trick, fool, and flummox computers - The Verge

State-of-the-art machine AI fooled by adversarial attacks

Even state-of-the-art machine AI can be fooled by adversarial attacks. In 2018, researchers at the University of California were able to fool Google's Inception-v3 image recognition model into misidentifying a panda as a gibbon by adding noise to the image. This highlights the need for AI models to be designed with adversarial robustness in mind.

Explaining and Harnessing Adversarial Examples by Ian J.Goodfellow, Jonathon Shlens and Christian Szegedy at Google Inc., Mountain View, CA (arxiv.org)

OpenAI machine vision AI fooled by handwritten text

In 2020, researchers at OpenAI were able to fool their own machine vision AI model by adding handwritten text to images. The AI model was designed to identify objects in images but was tricked into misidentifying objects when text was added to the image. This highlights the need for AI models to be designed with adversarial robustness in mind.

Tricking OpenAI’s latest vision system by adding a handwritten label to the target Multimodal neurons in artificial neural networks (openai.com)

By rendering text on an image, we artificially stimulate neuron 1330, which has high weight into the class “piggy bank” in a linear probe. This causes the classifier to misclassify a chainsaw as a piggy bank. Multimodal neurons in artificial neural networks (openai.com)

Adversarial Patch (arxiv.org) by Tom B. Brown, Dandelion Mané, Aurko Roy, Martín Abadi, Justin Gilmer at Google

Information on AI Attacks & AI Security Relevance

CWE-1039: Automated Recognition Mechanism with Inadequate Detection or Handling of Adversarial Input Perturbations	https://cwe.mitre.org/data/definitions/1039.html
CWE-1039: Automated Recognition Mechanism with Inadequate Detection or Handling of Adversarial Input Perturbations CWE-1288: Improper Validation of Consistency within Input	https://cwe.mitre.org/data/definitions/1039.html https://cwe.mitre.org/data/definitions/1288.html
CWE-1039: Automated Recognition Mechanism with Inadequate Detection or Handling of Adversarial Input Perturbations CWE-707: Improper Neutralization	https://cwe.mitre.org/data/definitions/1039.html https://cwe.mitre.org/data/definitions/707.html
CWE-1039: Automated Recognition Mechanism with Inadequate Detection or Handling of Adversarial Input Perturbations CWE-697: Incorrect Comparison	https://cwe.mitre.org/data/definitions/1039.html https://cwe.mitre.org/data/definitions/697.html
CWE-1039: Automated Recognition Mechanism with Inadequate Detection or Handling of Adversarial Input Perturbations CVE-2019-20634: Email Protection Evasion	https://cwe.mitre.org/data/definitions/1039.html https://nvd.nist.gov/vuln/detail/CVE-2019-20634

Some relevant and available CWEs & CVEs

How to Build Adversarial Robustness into AI Models

Cybersecurity of AI remains the #1 AI-related risk that organizations are trying to mitigate | Ref: The state of AI in 2022—and a half decade in review | McKinsey

Cybersecurity of AI remains the #1 AI-related risk that organizations are trying to mitigate. Keeping this in mind, several organizations such as Google , Microsoft and OpenAI have come up with Secure AI frameworks, principles, and practices to protect against adversarial attacks.

Building adversarial robustness into AI models is a complex process that requires a combination of techniques. One approach is to train the AI model on a diverse range of datasets to reduce the impact of biased or incorrect information. Another approach is to incorporate adversarial examples into the training data to make the AI model more robust to adversarial attacks.

Additionally, there are several techniques that can be used to detect and defend against adversarial attacks. One technique is to use adversarial training, where the AI model is trained on adversarial examples to improve its robustness. Another technique is to use adversarial defenses, where the AI model is designed to detect and reject adversarial examples.

In conclusion, adversarial machine learning is a real and growing threat to the reliability and security of AI models. Adversarial attacks on AI are becoming increasingly common, and even state-of-the-art AI models are susceptible to these attacks. However, by designing AI models with adversarial robustness in mind and incorporating techniques to detect and defend against adversarial attacks, we can build more reliable and secure AI models.

Are you concerned about the security and reliability of your AI models? Contact us to learn more about how we can help you build adversarial robustness into your AI models.

References

Nilesh Dalvi, Pedro Domingos, Mausam, Sumit Sanghai, and Deepak Verma. 2004. Adversarial classification. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '04). Association for Computing Machinery, New York, NY, USA, 99–108. https://doi.org/10.1145/1014052.1014066
Cyberattacks against machine learning systems are more common than you think | Microsoft Security Blog
Gartner Top 10 Strategic Technology Trends For 2020
R. S. Siva Kumar et al., "Adversarial Machine Learning-Industry Perspectives," 2020 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, 2020, pp. 69-75. https://arxiv.org/pdf/2002.05646.pdf
Why do AI attacks exist? (boschaishield.com)
Introducing Google’s Secure AI Framework

Have any feedback or questions? We'd love to hear from you.

Ready to secure your AI systems? Start your journey today.

Book your Demo