this post was submitted on 02 Jul 2023
492 points (96.6% liked)

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

55303 readers
676 users here now

⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don't request invites, trade, sell, or self-promote

3. Don't request or link to specific pirated titles, including DMs

4. Don't submit low-quality posts, be entitled, or harass others



Loot, Pillage, & Plunder

📜 c/Piracy Wiki (Community Edition):


💰 Please help cover server costs.

Ko-Fi Liberapay
Ko-fi Liberapay

founded 2 years ago
MODERATORS
 

cross-posted from: https://lemmy.intai.tech/post/43759

cross-posted from: https://lemmy.world/post/949452

OpenAI's ChatGPT and Sam Altman are in massive trouble. OpenAI is getting sued in the US for illegally using content from the internet to train their LLM or large language models

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 28 points 2 years ago (2 children)

What you said makes a lot of sense. But here's the catch: it assumes OpenAI checked the licensing for all the stuff they grabbed. And I can guarantee you they didn't.

It's damn near impossible to automatically check the licensing for all the stuff they got she we know for a fact they got stuff whose licensing does not allow it to be used this way. Microsoft has already been sued for Copilot, and these lawsuits will keep coming. Assuming they somehow managed to only grab legit material and they used excellent legal advisors that assured them out would stand in court, it's definitely impossible to tell what piece of what goes where after it becomes a LLM token, and also impossible to tell what future lawsuits will decide about it.

Where does that leave OpenAI? With the good ol' "I grabbed something off the internet because I could". Why does that sound familiar? It's something people have been doing since the internet was invented, it's commonly referred to as "piracy". But it's supposed to be wrong and illegal. Well either it's wrong and illegal for everybody or the other way around.

[–] [email protected] 7 points 2 years ago (1 children)

there were court cases around this very thing and google and webarchive. I suspect thier legal team is expecting similar precedent with the issue being down to the individual and how they use the index, example, using it to make my own unique character (easily done) vs making an easy and obvious rip off of a Disney property. The same tests can be applied, the question IMO isn't about the index that is built here. I can memorize a lot (some people have actual eidetic memory) and synthesize it too which is protected and I can copyright my own mental outputs. The disposition of this type of output vs mechanical outputs i expect will be where things end up being argued.

I'm not going to say I'm 100% right here, we are in a strange timeline but there is precedent for what OAI is doing IMO.

[–] [email protected] 6 points 2 years ago* (last edited 2 years ago) (1 children)

The issue becomes the sale/profit of selling access, such as with GPT-4 right now. Indexing/archiving and selling are two very different beasts.

[–] [email protected] 2 points 2 years ago* (last edited 2 years ago)

interesting lines to walk, depends on what they are selling, there is a definte cost to running a model and you are allowed to charge a reasonable fee to handle the process of providing the records. we used to pay per page for this kind of thing, now you pay per token

they can also sell a lot of services and tools around the model while still not using it in a non-infringing manner. this will all end up in front a of a judge, with the books laid out i suspect. I am not sure we will ever see any of the details, i hope we do.

[–] [email protected] 3 points 2 years ago

The difference between piracy and having your content used for training a generative model, is that in the latter case, the content isn't redistributed. It's like downloading a movie from netflix (and eventually distributing it for free) vs watching a movie on netflix and using it as inspiration to make your own movie.

The legality of it all is unclear and most of that is because the technology evolved so quickly that the legal framework is just not equipped to deal with it. Despite the obvious moral issues with scraping artist's content.