How Data Poisoning Works to Prevent AI Copyright Infringements
Infringements Tipping the Scales in Favor of Artists
As artificial intelligence becomes more commonplace, the copyright debate surrounding the technology’s training data rages on. In fact, many authors have raised lawsuits against ChatGPT’s creator, OpenAI, while a handful of artists have also filed against AI image generators Stability AI, Deviant Art, and Midjourney.
Due to these copyright-infringement concerns, protection mechanisms have already been put in place, such as OpenAI’s new “opt in” approach to training data, and regulation talks are ongoing. However, for some, these efforts aren’t enough, leading to the creation of a new data poisoning tool called Nightshade.
In this article, we explore data poisoning in depth and discuss how Nightshade could have a lasting impact on AI art generation. Let’s dive in (to the article, not the poison)!
What Is Data Poisoning?
In simple terms, data poisoning is the process of manipulating and falsifying AI models and machine learning training data, ultimately reducing the quality and accuracy of output. By injecting false information into an AI model’s training data, it increases the chances of it inaccurately identifying patterns and making incorrect predictions.
There are three main types of data poisoning:
Malicious Data Injection: As the name suggests, malicious data injection is where false or harmful information is added to an AI model’s training data. An example of this type of data poisoning is someone slipping a page of incorrect instructions into a recipe book, baffling the reader.
Mislabeling: AI is based on associative learning, meaning it learns by connecting items together. In the case of images, this typically means pairing visual information with text-based labels. However, mislabeling these images can cause major issues in the AI model’s understanding of certain objects. If an algorithm is fed an image of a dog labeled as a cat, for instance, it’s likely to get confused when presented with other images of accurately labeled dogs.
Data Manipulation: Data manipulation is the process of changing the way information is presented. For example, if you’re training an AI model to identify basic shapes, but the images used are not quite right–maybe a triangle has an extra side or a square has a circular edge, the AI model will incorrectly learn to identify shapes. It would, for instance, associate triangles with four sides.
How Does AI Art Data Poisoning Work?
Using the Nightshade data poisoning tool co-created by researcher Ben Zhao, artists are able to poison their images by scrambling the way pixels appear to generative AI models without affecting their visual appearance to humans. The main aim is to balance the scales between artificial intelligence models and content creators, giving creators an additional layer of protection against copyright infringements.
After testing from researchers at the University of Chicago, the Nightshade system has shown promising–or unfortunate, depending on your viewpoint–results in terms of poisoning machine learning datasets. For example, feeding an infected AI image generator the prompt “dogs” returns an image of cats, highlighting how the poison has confused the model.
Though researchers haven’t released exactly how the Nightshade system works, it’s likely to use a combination of all three main data poisoning methods. In fact, Nightshade’s parent system, Glade, uses mislabelling to confuse machine learning models.
How Will Data Poisoning Impact AI Art Generation?
If art data poisoning becomes common practice, it’s possible that many AI developers will struggle to train new models, stifling the rapid progress being made in artificial intelligence. However, with numerous companies already committed to a copyright-free approach to data training, like Adobe, who only train models on royalty-free stock images, this could amount to a non-issue. Additionally, as people adjust to the widespread implementation of artificial intelligence, attitudes will likely shift and fair-use guidelines will be refined.
Thanks for reading.
If you enjoyed this article, please subscribe to receive email notifications whenever we post.
AI Business Report is brought to you by Californian development agency, Idea Maker.
Sources:

