this post was submitted on 05 Jan 2025
268 points (97.9% liked)

Technology

60725 readers
3684 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 2 years ago
MODERATORS
 

It's getting a bit ridiculous out here. I'm using DuckDuckGo but since it aggregates its search from other sources, it's also gotten bad recently. Is there a search out there that blocks domains that spam AI? Extra points if there's something like Ublock Origin that filters things based on a community-made list.

Edit: I'm aware of Kagi but it's pretty expensive and I'm not a fan that they, too, host their own AI tools.

top 46 comments
sorted by: hot top controversial new old
[–] [email protected] 98 points 2 weeks ago (4 children)
[–] [email protected] 30 points 2 weeks ago (1 children)

in this github there is a bit of fuckery with the link notice he has an affiliate link for adblock plus instead of linking to the goddamn host list directly.

https://subscribe.adblockplus.org/?location=https%3A%2F%2Fraw.githubusercontent.com%2Flaylavish%2FuBlockOrigin-HUGE-AI-Blocklist%2Fmain%2Flist.txt&title=Sites+using+AI+generated+content

[–] [email protected] 2 points 2 weeks ago (1 children)

Do I just copy and paste this into my filter lists?

[–] [email protected] 12 points 2 weeks ago

I wouldn't, there are plenty of filter lists right from ublock itself which I trust more.

[–] [email protected] 9 points 2 weeks ago (1 children)
[–] [email protected] 3 points 2 weeks ago

Looks good, thank you

[–] [email protected] 2 points 2 weeks ago

Well I'm installing this and you're my new favorite person.

[–] [email protected] 52 points 2 weeks ago (2 children)

Search is eventually going to be so enshitified that the way to actually find out things is going to fall back on "ask someone you trust who knows things you don't". At least by that point those trustworthy people should be better informed than in the past..

[–] [email protected] 13 points 2 weeks ago

Maybe we will see the return of lists like what Yahoo was.

[–] [email protected] 11 points 2 weeks ago (1 children)

It's ultimately self-defeating as well because any future AI is going to be polluted by past AI's garbage content. Making it even harder to develop intelligent AI systems.

[–] [email protected] 4 points 2 weeks ago (1 children)

It can survive well where there's editorial control. I'd talk to an AI if it had only read encyclopedias for example..

[–] [email protected] 7 points 2 weeks ago (1 children)

I tried doing some of this. I trained on a corpus of data I wanted it to read, with such a small amount of training data, I found it was overall too lossy. If I asked it a question about something that was in there and it responded there was a really good chance that it was in there. But there was a lot of not knowing something that was definitely in there. It wasn't completely useless but I wouldn't say that it was at the level of being truly helpful.

I worry that there's not enough verified data out there to set up for proper training.

[–] [email protected] 4 points 2 weeks ago

I suspect such a model would have to be far more attuned to its data being smaller but trustworthy. Something like chatGPT for example requires a huge volume because it's weakly affected by any particular datum going in. It's designed to adapt to general conversation norms, rather than specific facts. If you could take a generalist like chatGPT and combine it with an expert model that's been told everything it's told has a huge weighting then that would probably be a big step forward.

[–] [email protected] 31 points 2 weeks ago (1 children)

I use uBlacklist to filter out stuff from search results, it works with a bunch of search engines. It has various lists you can subscribe to, also including anti-AI ones.

https://github.com/iorate/ublacklist

[–] [email protected] 2 points 2 weeks ago

Looks super cool. Too bad they don't have a way to add custom SearX instances other than modifying and building the extension yourself.

[–] [email protected] 23 points 2 weeks ago (3 children)

I think the best way to make the Internet less sh*tty is to get away from Google search.

I like the SearX search engine. It gives old-school, relevant search results, not google ranked ones.

https://search.inetol.net/

It's also spread out over many separate instances, so you can pick the one that best suits your search needs:

https://searx.space/

[–] pelespirit 8 points 2 weeks ago (1 children)

I've had good luck as a back up to Duck Duck Go with Mojeek. It's so old school, it doesn't always know what you want, but I sometimes want that.

[–] [email protected] 5 points 2 weeks ago

I've found Mojeek to be a bit hit and miss; but one thing I really appreciate is that they actually do the indexing and searching themselves (whereas pretty much every other search site uses Bing or Google behind the scenes). So although Mojeek may not be ideal, they are at least making an effort to be independent.

[–] [email protected] 8 points 2 weeks ago

I selfhost it on my laptop, pretty easy, and I always have it just the way I want it. Still pushing shit uphill with the AI crap, but better than any one search engine (it amalgamates many). Relevant to OP I have a large block list enabled, but it's very much a moving target.

[–] [email protected] 3 points 2 weeks ago

Oh damn. That was like a proper internet 2.0 kinda experience. A feller could get used to that.

[–] [email protected] 17 points 2 weeks ago (1 children)

Kagi! You can block websites so they don't show up. It'll also flag websites that contain a lot of spam or ads.

[–] [email protected] 8 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

Kagi lets you blacklist individual domains yourself, but I think what OP is asking is "is there a search engine that identifies and blacklists AI generated content itself".

I think that the answer is probably that yes, probably all search engines try to block spam websites of any sort, AI-generated or no, and will do so all the time, or at least downrank them. Trying to present relevant, useful material at the top of the results is basically the business that search engines are in.

Now, do any do so to a level sufficient to fully eliminate them? I'd guess not. SEO spammers have been trying to pollute top results with their hits for about as long as search engines have been around, and trying to cheaply bulk-generate content that looks like something that the user might want is just the latest form this takes. My guess is that that'll be a cat-and-mouse game for some time to come.

[–] [email protected] 3 points 2 weeks ago (1 children)

I think to claim to make an effort about that, I'm pretty sure I saw changelog about AI content detection at least for the image section.

[–] baggachipz 2 points 2 weeks ago (1 children)
[–] [email protected] 2 points 2 weeks ago

Ah, gotcha, thanks.

[–] [email protected] 15 points 2 weeks ago (1 children)

Man, i was looking up info about arrow rests for recurve/olympic archery yesterday and stumbled on a website that use some sort of AI fever dream for their images.

One kinda looked like a violins neckbrace (i don't know what those things are called) with some strings attached, but it looked like it should look like a thing but after closer inspection it was actually nothing sensible.

I think we've all seen those images that look like a room filled with itema but when you look at a specific item your mind figures out it's just weird shapes and colors.

What a nightmare that was.

[–] [email protected] 2 points 2 weeks ago

I had a similar experience today looking up beer bongs. Some real cthulu type of shit.

[–] [email protected] 13 points 2 weeks ago

Unless I need something recent whenever I search I update the results to dates from like 1999 to 2021. Filters out a lot of unnecessary crap.

[–] [email protected] 8 points 2 weeks ago (1 children)

I'm actually using searxng and just blocking any website with blatantly written AI shitcontent.

[–] [email protected] 5 points 2 weeks ago (2 children)

Ilused to use searx.be , had frequent down times and bad results.

Switched to startpage.com

[–] [email protected] 5 points 2 weeks ago

I just host my own searxng instance. Bonus: I get to tweak the config to my liking.

[–] RedstoneValley 2 points 2 weeks ago (1 children)

I'm using Startpage too, but I have a feeling that the search result quality had a massive drop lately.

[–] [email protected] 1 points 2 weeks ago (1 children)

They use Google and Bing for their results last I checked. So, if those two get worse then Startpage will too.

[–] RedstoneValley 2 points 2 weeks ago (1 children)

I don't know. Maybe Google is actively limiting results for Startpage. When I don't find what I'm looking for in Startpage I switch to Google and boom, adequate search result. I refuse to permanently go back to Google though.

[–] [email protected] 2 points 2 weeks ago

Could be that. Could also be the fact that Google adjusts results based on their profile of you, giving different results from Startpage.

[–] Imgonnatrythis 8 points 2 weeks ago (2 children)

Have you given Kagi an actual shake? If you are not interested in saving preferences longer term, you can keep cycling through free accounts. Now more than ever, it is a breath of fresh air. If I want a quick AI answer without scrolling through some ad-ridden web page, I just put a "?" at the end of my query. If not, I have no AI garbage on my results.

[–] [email protected] 4 points 2 weeks ago

+1 for Kagi, it’s definitely worth what I’m paying for it.

[–] [email protected] 0 points 2 weeks ago (1 children)

I love kagi but I don't think it actively filters out ai generated content.

I know when searching for pictures you can disable AI generated images.

I think the hard part for a search engine is that unless there is some kind of identifying mark on the content, how do they know that an ai didn't write a top 10 list of pastebin alternatives?

[–] [email protected] 1 points 2 weeks ago

It's not immune to it. If you are looking for something highly specific you will get slob for sure. To give an actual example, a buddy of mine told me that the walls of your house act like a sponge when you have the outer walls insulated but not the basement walls on the outside, at least against water. So I went looking on kagi for stuff to back that up (not that I didn't believe him, I just wanted to know more). A lot of the results were completely ai generated crap websites. There were good and somewhat relevant results, but in the end I gave up (also because we got confirmation that it's done on our house, so it became irrelevant).

[–] [email protected] 5 points 2 weeks ago (1 children)

It won't block them, but I started to feel like recently DDG's results were awful. I couldn't find simple things. I've switched to startpage and had a much better experience. The results feel more aligned with what I want and I feel like there's less crap. Its probably confirmation bias hah, but its working.

[–] [email protected] 5 points 2 weeks ago (2 children)
[–] [email protected] 2 points 2 weeks ago

Ohh what a neat page! I was unaware, thank you!

[–] [email protected] 1 points 2 weeks ago (1 children)

This does not work well on WebKit.

[–] [email protected] 1 points 1 week ago

*mobile in general

[–] [email protected] 3 points 2 weeks ago