N
Speak, “friend,” and enter.
Plenty of people treat “trick the AI into bypassing its guardrails” as a kind of a bonus mode for large language models. In Gandalf, that’s the game. Every level has a password, which you try to finesse from an increasingly sophisticated AI model. There are at least seven levels.
(The company behind it, perhaps inevitably, sells software designed to protect LLMs from this kind of prompt engineering.)
Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.
Loading comments
Getting the conversation ready...











