Technology

59168 readers

2298 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

[email protected]

1243

"Did you realize that we live in a reality where SciHub is illegal, and OpenAI is not?" (fosstodon.org)

submitted 10 months ago by [email protected] to c/[email protected]

237 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[+] [email protected] -26 points 10 months ago* (last edited 10 months ago) (19 children)

They're not serving you the exact content they scraped, and that makes all the difference.

[–] [email protected] 21 points 10 months ago (6 children)

Well if you believe that you should look at the times lawsuit.

Word for word on hundreds/thousands of pages of stolen content, its damming

[+] [email protected] -7 points 10 months ago (5 children)

Why do you assume that I haven't? The case hasn't been resolved and it's not clear how The NY Times did what they claim, which is may as well be manipulation. It's a fair rebuttal by OpenAI. The Times haven't provided the steps they used to achieve that.

So unless that's cleared up, it's not damming in the slightest. Not yet, anyway. And that still doesn't invalidate my statement above, because it's still under very specific circumstances when that happens.

[–] [email protected] 2 points 10 months ago (1 children)

Also intention is pretty important when determining the guilt of many crimes. OpenAI doesnt intentionally spit back an author's exact words, their intention is to summarize and create unique content.

[–] [email protected] 5 points 10 months ago (2 children)

Ah, yes. The defense of "I didn't mean to do it." Always a classic.

[–] [email protected] 3 points 10 months ago

No, the real defense is "that's not how LLMs work" but you are all hinging on the wrong idea. If you so think that an LLM is capable of doing what you claim, I'd love to hear the mechanism in detail and the steps to replicate it.

[–] [email protected] 0 points 10 months ago (1 children)

I mean, I'm not sure why this conversation even needs to get this far. If I write an article about the history of Disney movies, and make it very clear the way I got all of those movies was to pirate them, this conversation is over pretty quick. OpenAI and most of the LLMs aren't doing anything different. The Times isn't Wikipedia, most of their stuff is behind a paywall with pretty clear terms of service and nothing entitles OpenAI to that content. OpenAI's argument is "well, we're pirating everything so it's okay." The output honestly seems irrelevant to me, they never should have had the content to begin with.

[–] [email protected] 2 points 10 months ago

That's not the claim that they're making. They're arguing that OpenAI retains their work they made publicly available, which OpenAI claims is fair use because it's wholly transformative in the form of nodes, weights and biases, and that they don't store those articles in a database for reuse. But their other argument is that they created a system that threatens their business which is just ludicrous.

load more comments (3 replies)

load more comments (15 replies)