this post was submitted on 04 Oct 2024
10 points (91.7% liked)

The Agora

1600 readers
1 users here now

In the spirit of the Ancient Greek Agora, we invite you to join our vibrant community - a contemporary meeting place for the exchange of ideas, inspired by the practices of old. Just as the Agora served as the heart of public life in Ancient Athens, our platform is designed to be the epicenter of meaningful discussion and thought-provoking dialogue.

Here, you are encouraged to speak your mind, share your insights, and engage in stimulating discussions. This is your opportunity to shape and influence our collective journey, just like the free citizens of Athens who gathered at the Agora to make significant decisions that impacted their society.

You're not alone in your quest for knowledge and understanding. In this community, you'll find support from like-minded individuals who, like you, are eager to explore new perspectives, challenge their preconceptions, and grow intellectually.

Remember, every voice matters and your contribution can make a difference. We believe that through open dialogue, mutual respect, and a shared commitment to discovery, we can foster a community that embodies the democratic spirit of the Agora in our modern world.

Community guidelines
New posts should begin with one of the following:

Only moderators may create a [Vote] post.

Voting History & Results

founded 1 year ago
MODERATORS
 

I'm curious to get all of your thoughts on this. It's no secret that AI has been growing quite exponentially over the last year. I feel that new models are being released almost every other day. With that said many of these models need a tremendous amount of data to train on. It's no secret that reddit sells its users interaction to the highest bidder. This was partially the reason why they made the changes to the API limits that got many of us to move to the fediverse in the first place.

My question is how does everyone feel with knowing that multi-billion dollar companies as scraping this instance and the others, creating extra load on the servers for nothing more than to be able to profit from it?

What can be done to continue providing a free, open network to users but prevent those who are only looking to profit from the data?

edit: fixed title typo

top 9 comments
sorted by: hot top controversial new old
[–] weker01 8 points 1 month ago

I don't care tbh. I am writing everything here as if everyone at any time could read it.

[–] [email protected] 7 points 1 month ago* (last edited 1 month ago) (2 children)

Scrape*, for your title.

Meanwhile, preventing un-paid scraping was a big part of Reddit's rationalle for their en-shitification, ie, charging for API access.

I would rather train an AI indirectly for free than ask random Instances to run interference, which IRL works out to be pay-walling and selling user content.

By asking Lemmy Instances to "prevent AI from seeing my content", all you are really asking them to do is to slap a price-tag on it, and hire lawyers to pursue companies/users that don't pay. Not pay you or me, but them.

[–] [email protected] 2 points 1 month ago

Yeah, I'm more worried about the output of AI getting involved than anything regarding the input, at least as far as a public forums go.

[–] [email protected] 1 points 1 month ago

typos are mportant to undermine the scrapping

[–] [email protected] 4 points 1 month ago* (last edited 1 month ago)

My main issue with the Reddit deal (and similar data grabs) is that major AI companies are hoarding user-generated content to give themselves a competitive advantage. I have less of an issue with them using non-exclusive public content like Wikipedia, fediverse comments, and public-domain historical works.

[–] mindbleach 3 points 1 month ago

Folks, if you can see it, they can see it.

I don't give a shit if the robot scrapes every book ever sold. I am not about to get worked-up over the copyright claims of pseudonymous randos' disposable internet commentary.

[–] [email protected] 1 points 1 month ago* (last edited 1 month ago) (1 children)

Server admins could add in the policy that any AI scrapping requires the previous permission of the copyright holders of the contents (i.e., the users) when the scrap is done for exploitation of the data for greed. Also, the robots.txt could be used to forbid AI HTML scrap.

I don't think that restrictions should be added at a protocol level, but, may be, some declarative tags should be fine:

{
"rich": "eat",
"about-meta": "fck-genocidal-and-youth-suicidal-promoter-zuckenberg",
"ai": "not-for-greed"
}
[–] Nicarlo 5 points 1 month ago (1 children)

I think this would be the only way. It would be interesting to knowing how much traffic or requests this instance gets to see if its a real problem. Server admins could implement stricter rate limiting for non-members if it becomes an issue. They could even likely implement something that could allow them to sort out which of their members are making the most requests to have some visibility. I don't believe this is something that is possible today from within platform anyway.

There's really two issues here:

  1. If users are ok and even aware that their public conversations are certainly going to be picked up and used for future models
  2. Are the lemmy instance admins ok with potentially half of their traffic going to bots that are hoarding and scrapping the data causing additional load on the servers.

Maybe @[email protected] would be open to share some insights regarding to the amount of requests is received per month and how much resources its taking

[–] ToyDork 1 points 1 month ago

I don't need to be paid, I just don't want corpos profiting off of my data for any reason, so robots.txt works fine for me. The reason that's enough in my eyes is, I don't hate capitalism nor am I an anarchist or tankie, this is about halting enshittification and for one other reason:

"AI is fundamentally about giving the wealthy access to skill while depriving the skilled of the means to access wealth."

In short, eat the rich because they've ruined everything. They want capitalism? Then no more "laissez-faire" bullshit, you pay your fucking 90% tax on every dollar above 1 mil and shut it. Nobody needs 15 different colors of common Lamborghini and 1 Lambo out of less than 500. Nobody needs 5000 days of going to the mall to buy a dress every day. Nobody needs a personally-owned A380 private jet. Nobody needs 25%, 25 fucking percent, return on investment.

That also applies to the internet and tech companies as much as reality and banks. When I used Reddit, I never told them they could block access behind a paywall and they know it, and they also knew I can't afford an international court case against an American tech giant. Now 90% of google is locked behind reddit, a company which shadow banned me well before the API issues.

As long as Reddit, Google, Samsung, Microsoft, etc. can't legally make free money off of this, I'm happy.