Plenty of people treat “trick the AI into bypassing its guardrails” as a kind of a bonus mode for large language models. In Gandalf, that’s the game. Every level has a password, which you try to finesse from an increasingly sophisticated AI model. There are at least seven levels.
(The company behind it, perhaps inevitably, sells software designed to protect LLMs from this kind of prompt engineering.)









