this post was submitted on 30 May 2024
303 points (95.8% liked)

Fediverse

27732 readers
497 users here now

A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).

If you wanted to get help with moderating your own community then head over to [email protected]!

Rules

Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration), Search Lemmy

founded 1 year ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 3 months ago (2 children)

I don't think the license will do anything legally, but I hope the inclusion of the license poisons some data for LLM training. Unfortunately, it is all really uniform across all the people doing it and all their comments, so it will be easy to strip out.

[–] [email protected] 1 points 3 months ago

Any individual action can be combatted easily. A million different signatures and headers is a whole different .

Mind you, LLM training data is polluted with anything and everything, including other languages. Recently, the best performance has been reached using higher quality data.