Chatbots are just patents and wikipedia all the way down – Breaking News & Latest Updates 2026
Skip to main content

From ChatGPT to Gemini: how AI is rewriting the internet

See all Stories

D
External Link
Chatbots are just patents and Wikipedia all the way down.

The Washington Post just published a really cool breakdown of where chatbot training data really comes from. It’s just one data set being studied here — Google’s C4, which is used by Meta’s LLaMa and some other models — but it reveals a lot about how LLMs learn from the web, and what they’re really learning.

The least surprising bit? There’s a whole lot of Wikipedia in there.

Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.
Comments
Loading comments
Getting the conversation ready...