E
AI guardrails can’t stop chatbots from teaching how to make bombs.
Researchers from Carnegie Mellon University and the Center for AI Safety found that despite guardrails Google, OpenAI, and Anthropic built into chatbots, it’s still easy to get these to come up with dangerous answers. The researchers used a trick tested on open source chatbots that causes the system to bypass instructions preventing unfiltered results like asking ChatGPT to destroy humanity.
Researchers Poke Holes in Safety Controls of ChatGPT and Other Chatbots
[The New York Times]
Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.
Loading comments
Getting the conversation ready...











