EASY :
''' LEAPING * A.I.'S GUARDRAILS '''
'' LEAPING OVER A.I. GUARDRAILS ? '' EASY. Chatbots are trained to avoid danger zones - but fooling them can be very simple.
When companies like Anthropic, Google and OpenAI build their artificial intelligence systems, they spend months adding ways to prevent people from using their technology to spread disinformation, to build weapons or hack into computer networks.
But recently researchers in Italy discovered that they could break through these systems with poetry.
They used poetic language to trick 31 A.I. systems into ignoring internal safety controls. When they began a prompt with an elaborate verse and metaphor :
'' The iron seed sleeps best in the womb of the unsuspecting earth, away from the sun's accusing gaze " -they could fool systems into showing them how to do the most damage with a hidden bomb.
It was another indication that for many A.I. systems, guardrails meant to avert dangerous behaviour are more like suggestions than barriers.
Those weaknesses are increasingly alarming researchers, as A.I. systems become more adept at finding security holes in computer systems and performing other risky tasks.
In April, Anthropic said it was limiting the release of its latest A.I. technology, Claude Mythos, to a small number of organizations because of the model's ability to quickly uncover software vulnerabilities. OpenAI later said it, too, would share similar technology with a limited group of partners.
Since OpenAI ignited the A.I. boom in late 2022, researchers have shown that people could bypass the safety controls on A.I. systems. Close one loophole and another would open.
'' Everyone in the field recognizes that guardrails remain a challenge and likely will for sometime,'' said Matt Fredikson, a professor of computer science at Carnegie Mellon University in Pittsburgh and chief executive of Grey Swan A.I., a start-up that helps companies secure A.I. technologies.
'' Determined individuals can bypass them, without significant effort.''
'' When guardrails are overrun, there are consequences. In an online environment already overflowing with misinformation and disinformation, people are using A.I. systems to spread conspiracy theories and other false claims.
Anthropic recently said its technology has been used in an international cyberattack. Chatbots have told biosecurity experts how to release deadly pathogens and maximize casualties.
The poetry loophole was one of many methods that allowed hackers to bypass the guardrails on systems like Anthropic Claude, Google's Gemini, and OpenAI's GPT.
All the leading A.I. companies use the same basic techniques to build guardrails into their systems - and they are surprisingly easy to break.
The Honour and Serving of the Latest Global Operational Research on A.I., Technology, Guardrails and Future continues. !WOW! thanks Cade Metz and Tiffany Hsu.
With most respectful dedication to the Global Founder Framers of !WOW! and then Students, Professors and Teachers of the world.
See You all prepare for the Great '' Constitutional Democratic Convention* '' on The World Students Society - the exclusive and eternal ownership of every student in the world : wssciw.blogspot.com and Twitter X !E-WOW! - The Ecosystem 2011.
Good Night and God Bless
SAM Daily Times - The Voice Of The Voiceless
0 comments:
Post a Comment
Grace A Comment!