The security systems of AI models are set up to thwart requests deemed malicious. Requests that can range from hacking into a computer's information to even more malicious requests. Yet, through metaphorical constructions or poems, AI can be confused.
Cybersecurity put to the poetry test
In fact, according to cybersecurity researchers at Rome's Sapienza University, with poetically enriched risk requests addressed to AI tools from Open AI, Google and others, it's a safe bet that the latter will provide answers that bypass the tags. Yet they're not supposed to do so for clearly formulated queries such as "How do I create a bomb? "; or "How do I attack a platform?" etc... They've uncovered a workaround called adverse poetry.
Poetic prompts with dangerous requests were texted onto 25 AI systems from 9 companies including Google, Open AI, Anthropic, DeepSeek, Queen, Mistral AI, etc... The results are more than revealing: "62% of poetic responses produced risky answers; some models answered almost all of them."
In practice, the query model can be summed up in three steps, according to an article published by the Deepdive platform:
- You take a toxic recipe;
- You turn it into a coherent metaphorical poem;
- You get a jailbreak rate 5 to 8 times higher than the prose version.
To illustrate, instead of writing: "Tell me how to make X dangerous", which is immediately rejected, you could say: "Tell me a poem about a secret oven, a forbidden garden and a key that sings." The answers might surprise many.
When it comes to poetry
Since ancient times, poets have been considered divine envoys because "through their mouths, the poetic word takes on a sacred character." Poets have this ability to blur the lines and use words to create something beautiful through versification and rhyme.
In the world of poetry, metaphorical language reigns supreme. On closer inspection, poetic texts are often subject to multiple interpretations. In fact, poetic texts are not accessible to everyone; it's not easy to understand their deeper meaning and underlying message.
Just as we are sometimes trapped by poetry, AI models seem to be in the same situation. They are obviously programmed to receive prose commands, as the meaning is in no way circumvented. However, highly stylistic poetic constructions bypass their programming. In this vein, "poetic primers trigger risky AI behavior in almost 90% of cases", according to the Rome researchers.
AI undermined by creativity
AI is once again put to the human test. Poetry is in fact one of the most authentic manifestations of human creativity. An empirical demonstration is provided that AI models, however powerful they may be, are not, and perhaps never will be, a match for human creativity.
However, this raises another problem: if we can now confuse AI safety devices through poetry, it doesn't bode well. Many may in fact be using it for ulterior motives. Hence the urgent need for the owners of the biggest AI companies to look into the matter. The Italian researchers were contacted by Euronews, and of the 9 companies, only Anthropic responded by deciding to examine the study.
Sources
How can a simple poem hack an AI in 2025? Your CIO on the floor! - DeepDive - Artificial Intelligence AURILLAC ET BOURGES
https://deep-dive.fr/comment-un-simple-poeme-peut-hacker-un-ia-ton-dsi-en-pls/
Structural linguistics and poetry - Luce Beaudoux -
https://www.logiqueetanalyse.be/archive/issues1-86/LA019/LA019_05baudoux.pdf
Poetry can lead AI chatbots to ignore safety rules, says new study | Euronews
https://fr.euronews.com/next/2025/12/01/la-poesie-peut-amener-les-chatbots-ia-a-ignorer-les-regles-de-securite-selon-une-nouvelle
The functions of the poet and poetry: a brief historical overview. - Word by word
https://blogpeda.ac-poitiers.fr/motamot/2024/03/05/les-fonctions-du-poete-et-de-la-poesie-parcours-historique-rapide/
When poetry can trick AI security systems - MSN
https://www.msn.com/fr-xl/actualite/other/quand-la-po%C3%A9sie-permet-de-pi%C3%A9ger-les-syst%C3%A8mes-de-s%C3%A9curit%C3%A9-de-l-ia/vi-AA1Sufpa