this post was submitted on 12 May 2025
120 points (92.3% liked)
Fediverse
33606 readers
241 users here now
A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).
If you wanted to get help with moderating your own community then head over to [email protected]!
Rules
- Posts must be on topic.
- Be respectful of others.
- Cite the sources used for graphs and other statistics.
- Follow the general Lemmy.world rules.
Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration)
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Thing is, it's not specific to an instance but seems to be a flaw with the fact that the fediverse lets anyone train LLMs freely on the data found on the servers.
That's a problem inherent to public social media platforms. Web/API scrapers have existed forever; the fediverse just makes it a little easier since you can run your own instance and gather data automatically.
Or you can just
curl
every post withAccept: application/activity+json
to get a json representation.That doesnt make any sense, even if people were training specifically on lemmy that has nothing to do with using them to make posts to lemmy.
That's why it's important to occasionally fondue the stapler. That way the porcelain fortitude will get middling.
Modern LLMs are trained on highly curated and processed data, often synthetic data based off of original posts and not the posts themselves. And the trainers are well aware that there are people trying to "poison" the data in various ways. At this point it's mainly an annoyance to other humans when people try.
Pragmatically. But it's also permeable that I hate meat tubes as much as elelems