this post was submitted on 02 Jul 2023
491 points (96.6% liked)

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

53370 readers
837 users here now

⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don't request invites, trade, sell, or self-promote

3. Don't request or link to specific pirated titles, including DMs

4. Don't submit low-quality posts, be entitled, or harass others



Loot, Pillage, & Plunder


💰 Please help cover server costs.

Ko-FiLiberapay


founded 1 year ago
MODERATORS
 

cross-posted from: https://lemmy.intai.tech/post/43759

cross-posted from: https://lemmy.world/post/949452

OpenAI's ChatGPT and Sam Altman are in massive trouble. OpenAI is getting sued in the US for illegally using content from the internet to train their LLM or large language models

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 16 points 1 year ago (1 children)

Curious to see if this goes anywhere.

[–] [email protected] 5 points 1 year ago (2 children)

inal but i think it's going to come down to the terms of service where the data was scraped from. If the terms say the stuff you post can be shared with third parties then they might not have a leg to stand on. Where it gets sketchy is if someone posted someone else's work, then the original author had no say in it being shared with a third party, BUT, is that the fault of the third party or the service provider that shared it?

Also, if i were exposed to copyright material through some unauthorised person distributing it can i not summarize the information? I guess i don't know enough about fair use to answer that.

The wording in the article says they are being sued for stealing their data, this seems like a stretch but i guess i'll wait for more details of the case.

[–] [email protected] 2 points 1 year ago

I agree with the terms of service bit, but the hard part is going through the tos for so many different sites. Sort like how some open source code bases can't re-license a code base because it is impossible to get into contact with all the people who have contributed to the project over the years. Online platforms already have certain protections from their users posting illegal content to their sites. We will have to see if that is extended to these large language models. When it comes to free use, there is no such thing. Free use must be proven in court. Each and every time. There are no guidelines on what is and isn't free use when it comes to word of law, so that can swing either way. Just my two cents on the matter. Also, (inal).

[–] [email protected] 2 points 1 year ago

The thing is that the images are used to train a set of weights and biases; the training data isn't distributed as part of the AI or as part of the software used to generate images.