this post was submitted on 14 Jun 2023

29 points (100.0% liked)

Lemmy.World Announcements

28383 readers

4 users here now

This Community is intended for posts about the Lemmy.world server by the admins.

Follow us for server news 🐘

Outages 🔥

https://status.lemmy.world

For support with issues at Lemmy.world, go to the Lemmy.world Support community.

Support e-mail

Any support requests are best sent to [email protected] e-mail.

Report contact

DM https://lemmy.world/u/lwreport
Email [email protected] (PGP Supported)

Donations 💗

If you would like to make a donation to support the cost of running this platform, please do so at the following donation URLs.

If you can, please use / switch to Ko-Fi, it has the lowest fees for us

Join the team

founded 1 year ago

MODERATORS

[email protected]

Reddit is OpenAI's moat (www.cyberdemon.org)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]

13 comments fedilink hide all child comments

Interesting theory for what might have been another motivation behind the API changes. After all, Sam Altman (CEO of OpenAI) is a member of the Reddit board. What do you think?

edit: this is not my article by the way

top 13 comments

sorted by: hot top controversial new old

[–] [email protected] 13 points 1 year ago (3 children)

I thought it was obvious this was the case. Twitter and Reddit are unhappy that AI language models have used all this data for training, but didn't get paid for it. Personally I don't even consider it "their" data to begin with even if they can claim legal ownership of it. But they want to get paid obscene amounts of money for data that was created by the goodwill of their users.

[–] [email protected] 7 points 1 year ago

If they were worried data being scooped up for AI training, they would have approached it differently. What they did was target 3rd party apps to drive people to their app so they can get tracking data and push ads.

[–] [email protected] 4 points 1 year ago

Yes, but the special thing here is that OpenAI, which has a lot of shared stakeholders with Reddit, has already trained their models on its data, so they might have an interest in turning it off for the other companies. Also, they might be in a better position to negotiate with Reddit for special access to the data than smaller companies.

It's a pretty wild theory, but interesting nontheless.

[–] [email protected] 4 points 1 year ago (1 children)

It's not their data. If you scrape Reddit for the comments are reposted them somewhere else Reddit wouldn't be able to come after you with a copyright violation lawsuit.

Any potential copyright is still owned by the original user with Reddit having a license to sublicense for "syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit."

They would have to come after you with a ToS contract violation or maybe some kind of Computer Fraud and Misuse allegations.

[–] [email protected] 3 points 1 year ago (1 children)

I completely agree it isn't their data. They still want money for data that isn't theirs.

[–] [email protected] 3 points 1 year ago

Sorry if I seemed argumentative. I was trying to state that it wasn't just your opinion that they don't own user data but it is a fact they don't own user data.

[–] [email protected] 9 points 1 year ago (1 children)

I've heard the inverse reasoning that the API is limited because of OpenAI crawling it - which makes no sense if CEO is part of reddit board.

I just don't care at this point ;)

[–] [email protected] 9 points 1 year ago (1 children)

Yeah, but the reasoning in the post is that OpenAI has already profited from the data, and might have a better position to negotiate special access with them than smaller companies, thus reducing competition.

[–] [email protected] 1 points 1 year ago

I can guarantee that openai has not taken the data of reddit though API, they have crawled it through like everyone else (Google, Microsoft, etc.). Reddit doesn't want to limit that, it would hurt their Google visibility.

[–] [email protected] 7 points 1 year ago (1 children)

I don't get it, it's not like Reddit is a hard website to crawl without an API. I can write a script to crawl a website in an afternoon with something like Selenium, and they can't stop me. AI companies already have a big enough dataset from Reddit, I doubt they'll invest in the API at all.

[–] [email protected] 4 points 1 year ago

Yeah, AI devs using API is just bs reason, you can crawl any site that doesn't require login so easily. And reddit doesn't want to limit robot crawling because they would lose Google results also.

[–] [email protected] 7 points 1 year ago

It doesn't make sense to me, AI companies already have reddit data, language models have been on the training for years, blocking access now wouldn't change anything from that point of view.

It's much more likely reddit wants to monetize mobile access before the IPO, to please investors, killing 3rd party apps is the only way for them to have full control on mobile access.

[–] [email protected] 3 points 1 year ago

multi-account and private, small instance launch pads are going to be key. servers and users need to be able to control the flow of content similar to the way email does today.

think about things as having sperate read vs publish rights, in essence anyone can read anything, but there are limits to who can publish and re-distribute.