A
“Even easy things are hard.”
Astute AI copyright observer Michael Weinberg raises some good questions about the Common Pile, an AI training dataset billed as being composed of only “openly licensed text”:
On one hand, this is an interesting effort to build a new type of training dataset that illustrates how even the “easy” parts of this process are actually hard. On the other hand, I worry that some people read “openly licensed training dataset” as the equivalent of (or very close to) “LLM free of copyright issues.”
Does an AI Dataset of Openly Licensed Works Matter?
[michaelweinberg.org]
Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.
Loading comments
Getting the conversation ready...
Most Popular
Most Popular
- Sony’s PlayStation 5 is $200 off for the first time since December
- Anthropic’s most dangerous AI model just fell into the wrong hands
- Elon Musk admits that millions of Tesla vehicles won’t get unsupervised FSD
- The unraveling of Dan Crenshaw
- I bought Alienware’s $350 OLED monitor and I can’t believe how good it is











