this post was submitted on 25 Jul 2024
350 points (97.8% liked)

Technology

59559 readers
3439 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
top 39 comments
sorted by: hot top controversial new old
[–] [email protected] 161 points 4 months ago (3 children)

Lemmy welcomes you, new ex-redditors.

[–] [email protected] 21 points 4 months ago (1 children)
[–] [email protected] 4 points 4 months ago

tiptiptiptip

[–] [email protected] 7 points 3 months ago (1 children)

So that means Lemmy is more accessible to search engines, right?

[–] [email protected] 8 points 3 months ago (1 children)

And AI scrapers and bots.

I wonder if all the political shilling is in hopes of future AI learning from it and being biased in the intended way.

[–] [email protected] 5 points 3 months ago* (last edited 3 months ago)

Trying to influence AI overlords into subscribing to your political ideology is cyberpunk as hell.

[–] [email protected] 69 points 4 months ago (3 children)

I'm sure they've convinced the board and the shareholders that this is some kind of big win. But I don't think it's going to be impressive for very long.

There's only so much value an AI can learn from reddit bullshit like "1. break off all contact 2. hit the gym 3. profit" and "the narwhal bacons at midnight" and endless boring pun threads.

[–] [email protected] 31 points 4 months ago

Short term profit is all they care about until this platform crashes down completely

[–] [email protected] 8 points 4 months ago (1 children)

It sounds a lot like this quote from Andrej Karpathy :

Turns out that LLMs learn a lot better and faster from educational content as well. This is partly because the average Common Crawl article (internet pages) is not of very high value and distracts the training, packing in too much irrelevant information. The average webpage on the internet is so random and terrible it's not even clear how prior LLMs learn anything at all.

[–] [email protected] 3 points 3 months ago* (last edited 3 months ago) (1 children)

So it will end in a downward spiral because it starts learning from AI articles, from which articles are being written, from which the AI learns, from which articles are being written ...

[–] [email protected] 1 points 3 months ago (1 children)

As long as there’s supervision during training, which there always will be, this isn’t really a problem. This just shows how bad it can get if you just train on generated stuff.

[–] [email protected] 3 points 3 months ago (1 children)

which there always will be

How? We just learned that they train on social media.

[–] [email protected] 1 points 3 months ago

They don't train on random social media posts. Everything is sorted and approved.

[–] [email protected] 4 points 3 months ago (1 children)

Well it learned to put glue on pizza, eat rocks, and smoke while pregnant.

[–] [email protected] 1 points 3 months ago

You forgot jumping off the Golden Gate bridge

[–] [email protected] 49 points 4 months ago

How to kill even more traffic to Reddit

[–] Rutty 30 points 3 months ago

Something something net neutrality?

[–] [email protected] 24 points 4 months ago

Just use ddg bangs if you use Duckduckgo and you can search reddit directly.

!reddit search term

It still picks up latest posts related to reddit, it just searches reddit directly instead of searching Bing's results. It's that simple.

[–] [email protected] 20 points 4 months ago* (last edited 4 months ago) (2 children)

“Hey, so it’s me, the guys who left all those comments. Yeah, so we decided that since we wrote them, and the American system says that means we hold the copyright, we don’t really want you selling them without (a) securing our permission first, and (b) giving us a cut of the action. Were thinking maybe like a 30% royalty. It’s not like exorbitant; it probably won’t work out to much more than a few cents per user. But it’s more about the principle, you know?”

“Anyway, what do you think?”

WHAT DO I THINK

I THINK IT’S ALL MINE

DO YOU HEAR ME

MINE

NOW PAY ME FOR THE USE OF MY API YOU FILTHY PEASANT

PAY ME NOW

IT’S ALL MINE, PAY ME

600K A YEAR IS NOT ENOUGH

PAY ME MORE PAY ME PAY ME PAY ME

[–] [email protected] 17 points 4 months ago

no worries. Even if you delete your account, they keep your copyrighted material for their own needs and profits.

[–] RmDebArc_5 14 points 4 months ago

I hate to break to you, but when you accepted the TOS you gave away everything including your soul. Check out Tosdr, look for Reddit and click on “you wave your moral rights”

[–] [email protected] 18 points 4 months ago

And I switched engine, so that it's not polluted with Reddit garbage anymore, especially since there's so much AI spam there now.

[–] [email protected] 13 points 3 months ago

Spez adores Musk and wants to follow his footsteps

[–] gravitas_deficiency 12 points 3 months ago (1 children)

Antimonopoly enforcement when?

~never,~ ~of~ ~course~

[–] [email protected] 3 points 3 months ago

It's happening with Amazon now.

Wheels are in motion on anti-monopoly, but it's a major societal shift and that takes time. Time and not electing billionaires to public office. A democracy isn't going to build up momentum to do anything about the unchecked power of billionaires if around half the population is voting for billionaires.

[–] Vaeril 9 points 3 months ago (2 children)

Why do I still see Reddit results on DDG? Is that just old stuff and new stuff won't be indexed?

[–] [email protected] 6 points 3 months ago

404 notes that Bing, DuckDuckGo, Mojeek, and Qwant are all affected, with results either not showing anything recent, or not showing the full site result. Kagi, a paid search engine, is apparently still showing data, but only because it buys some of its search index from Google, which continues to have access to Reddit data through the aforementioned deal.

[–] [email protected] 1 points 3 months ago

Anything after 2nd of July is not accessible, entries before that date are fine.

[–] [email protected] 3 points 3 months ago (1 children)

Asking for a friend…

What would it take to create a domain that just acts as a proxy to Reddit but serves up its own robots.txt that allows all bots?

[–] [email protected] 7 points 3 months ago

Probably a LOT of proxy IPs to act as different "Users" so you can overcome the rate limit that I expect they would be using to enforce such a deal