Poisoning is a term most often associated with the human body and natural environments.
But it is also a growing problem in the world of artificial intelligence (AI) – in particular, for large language models such as ChatGPT and Claude. In fact, a joint study by the UK AI Security Institute, Alan Turing Institute and Anthropic, published earlier this month, found that inserting as few as 250 malicious files into the millions in a model’s training data can secretly “poison” it.
So what exactly is AI poisoning? And what risks does it pose?
What is AI poisoning?
Generally speaking, AI poisoning refers to the process of teaching an AI model wrong lessons on purpose. The goal is to corrupt the model’s knowledge or behaviour, causing it to perform poorly, produce specific errors, or exhibit hidden, malicious functions.
It is like slipping a few rigged flashcards into a student’s study pile without their knowledge. When the student gets a similar question on a test, those rigged flashcards kick in and they give the wrong answers automatically even though they think they are doing it right.
In technical terms, this kind of manipulation is called data poisoning when it happens during training. Model poisoning is when attackers alter the model itself after training.
In practice, the two often overlap because poisoned data eventually changes the model’s behaviour in similar ways.
- Author: Seyedali Mirjalili - Torrens University Australia, The Conversation
0 comments:
Post a Comment
Grace A Comment!