Technology

71446 readers

2120 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

769

OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series (www.businessinsider.com)

submitted 2 years ago by [email protected] to c/[email protected]

296 comments fedilink hide all child comments

OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 34 points 2 years ago (2 children)

I hope OpenAI and JK Rowling take each other down

[–] [email protected] 1 points 2 years ago

Sticky this comment

[–] [email protected] 0 points 2 years ago (2 children)

What's the issue against openAI?

[–] Corkyskog 10 points 2 years ago (1 children)

They used to be a non profit, that immediately turned it into a for profit when their product was refined. They took a bunch of people's effort whether it be training materials or training Monkeys using the product and then slapped a huge price tag on it.

[–] [email protected] -1 points 2 years ago

I didn't know they were a non profit. I'm good as long as they keep the current model. Release older models free to use while charging for extra or latest features

[–] [email protected] 2 points 2 years ago (3 children)

They’re stealing a ridiculous amount of copyrighted works to use to train their model without the consent of the copyright holders.

This includes the single person operations creating art that’s being used to feed the models that will take their jobs.

OpenAI should not be allowed to train on copyrighted material without paying a licensing fee at minimum.

[–] [email protected] 2 points 2 years ago (2 children)

Also Sam Altman is a grifter who gives people in need small amounts of monopoly money to get their biometric data

[–] [email protected] 2 points 2 years ago

So hypothetical here. If Dreddit did launch a system that made it so users could trade Karma in for real currency or some alternative, does that mean that all fan fictions and all other fan boy account created material would become copyright infringement as they are now making money off the original works?

[–] [email protected] -1 points 2 years ago (1 children)

If they purchased the data or the data is free its theirs to do what they want without violating the copyright like reselling the original work as their own. Training off it should not violate any copyright if the work was available for free or purchased by at least one person involved. Capitalism should work both ways

[–] [email protected] 1 points 2 years ago (1 children)

But they don’t purchase the data. That’s the whole problem.

And copyright is absolutely violated by training off it. It’s being used to make money and no longer falls under even the widest interpretation of free use.

[–] [email protected] -1 points 2 years ago* (last edited 2 years ago) (1 children)

You need to expand on how learning from something to make money is somehow using the original material to make money. Considering that's how art works in general, I'm having a hard time taking the side of "learning from media to make your own is against copyright". As long as they don't reproduce the same thing as the original, I don't see any issues with it. If they learned from Lord of the rings to then make "the Lord of the rings" then yes, that'd be infringement. But if they use that data to make a new IP with original ideas, then how is that bad for the world/ artists.

[–] [email protected] 2 points 2 years ago

Creating an AI model is a commercial work. They’re made to make money. Now these models are dependent on other artists data to train on. The models would be useless if they weren’t able to train on anything.

I hold the stance that using copyrighted data as part of a training set is a violation of copyright. That still hasn’t been fully challenged in court, so there’s no specific legal definition yet.

Due to the requirement of copywritten materials to make the model function I feel that they are using copyrighted works in order to build a commercial product.

Also AI doesn’t learn. LLMs build statistical models based on sentence structure of what they’ve seen before. There’s no level of understanding or inherent knowledge, and there’s nothing new being added.