this post was submitted on 19 Aug 2024
228 points (96.0% liked)

Technology

58011 readers
2831 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

I use Duckduckgo, but I realised these big(ish) search engines give me all the commercialised results. Duckduckgo has been going down the slope for years, but not at such a rate as Google or Bing has.

I want to have a search engine that gives me all the small blogs and personal sites.

Does something like this exist?

top 50 comments
sorted by: hot top controversial new old
[–] [email protected] 51 points 3 weeks ago (3 children)
[–] [email protected] 14 points 3 weeks ago* (last edited 3 weeks ago) (1 children)

I'm intrigued. The search results are more akin to how they used to be 25 years ago on the internet that I loved
Https://Search.marginalia.nu is definitely something I'll be exploring going forward!

[–] [email protected] 4 points 3 weeks ago

I searched for Dance Dance Revolution and ended up here.

The absolute nostalgia of it all. Love it.

[–] [email protected] 8 points 3 weeks ago (1 children)

Replying under the top comment but this really applies to all of these, how do these search engines determine what counts as a personal site? For example I had procrastinated for years on finally spinning up a static, barren HTML blog. The infamous Lucidity AI post introduced me to Mataroa and I got over the hump and started writing. Would that get indexed? Etc

Does it just crawl through webrings?

[–] [email protected] 4 points 3 weeks ago

I believe you have to submit your own website to this one for manual addition to its index

[–] [email protected] 6 points 3 weeks ago

That is exactly what I needed; the subdomains are now in my bookmarks.

[–] [email protected] 35 points 3 weeks ago (5 children)

Teclis - Includes search results from Marginalia, free to use at the moment. This search index has been in the past closed down due to abuse.

Kagi, whose creation Teclis is, is a paid search engine (metasearch engine to be more precise) also incorporates these search results in their normal searches. I warmly recommend giving Kagi a try, it's great, I've been enjoying it a lot.

--

Other options I can recommend; You could always try to host your own search engine if you have list of small-web sites in mind or don't mind spending some effort collecting such list. I personally host Yacy [github link] (and Searxng to interface with yacy and several other self-hosted indexes/search engines such as kiwix wiki's.). Indexing and crawling your own search results surprisingly is not resource heavy at all, and can be run on your personal machine in the background.

[–] [email protected] 12 points 3 weeks ago (1 children)

Not just a meta search engine though - they do have their own index as well.

https://help.kagi.com/kagi/search-details/search-sources.html

[–] [email protected] 5 points 3 weeks ago

Yes, I mentioned Kagi because of the Teclis search index is hosted by them.

However, most of the search results in Kagi are aggregated from dedicated search engines. (such as, but not limited to: Yandex, Brave, Google, Bing, etc.)

[–] [email protected] 2 points 3 weeks ago

Personally really been enjoying Kagi for the past year.

load more comments (3 replies)
[–] [email protected] 34 points 3 weeks ago (2 children)

Try this engine

https://search.marginalia.nu/

Or a SearXNG instance

https://search.disroot.org/search

You may also be interested in the Indie Web movement. This site is a great resource for it, with yet more links to indie sites and blogs.

Finally, not quite what you asked but here's a freebie, in case you didn't know about it:

https://wiby.me/

It's an old web search engine. It only indexes pages from the 00s and earlier.

[–] [email protected] 6 points 3 weeks ago (1 children)

Ah Marginalia is absolutely awesome! I feel like modern search is almost an extension of website names now, so if I want to find netflix but don't know it's website, I might search for "netflix". Marginalia is actually a cool way to find new stuff- like you can search "bike maintenance" and find cool blog posts about that topic.

I honestly can't remember if that's something google and the like used to do, but doesn't now, or if they never did. Either way, I love it!

load more comments (1 replies)
[–] [email protected] 5 points 3 weeks ago* (last edited 3 weeks ago)

Aside from SearXNG, I didn't know about these search engines until your recommendation. Thanks to Wiby and Marginalia, I found old rich content (old BBS list conversations, for example) that I was looking for, regarding studies on the occult and esotericism. Thank you so much!

[–] [email protected] 28 points 3 weeks ago (3 children)

This is a great question, in that it made me wonder why the Fediverse hasn't come up with a distributed search engine yet. I can see the general shape of a system, and it'd require some novel solutions to keep it scalable while still allowing reasonably complex queries. The biggest problems with search engines is that they're all scanning the entire internet and generating a huge percent of all internet traffic; they're all creating their own indexes, which is computationally expensive; their indexes are huge, which is space-expensive; and quality query results require a fair amount of computing resources.

A distributed search engine, with something like a DHT for the index, with partitioning and replication, and a moderation system to control bad actors and trojan nodes. DDG and SearX are sort of front ends for a system like this, except that they just hand off the queries to one (or two) of the big monolithic engines.

[–] [email protected] 8 points 3 weeks ago* (last edited 3 weeks ago) (2 children)

We'd love to build a distributed search engine, but it would be too slow I think. When you send us a query we go and search 8 billion+ pages, and bring back the top 10, 20....up to 1,000 results. For a good service we need to do that in 200ms, and thus one needs to centralise the index. It took years, several iterations and our carefully designed algos & architecture to make something so fast. No doubt Google, Bing, Yandex & Baidu went through similar hoops. Maybe, I'm wrong and/or someone can make it work with our API.

[–] [email protected] 9 points 3 weeks ago (1 children)

I think 200ms is an expectation of big tech. I know people have very little patience these days, but if you provided better quality searches in 5 seconds people would probably prefer that over a .2 second response of the crap we’re currently getting from the big guys. Even better if you can make the wait a little fun with some animations, public domain art, or quotes to read while waiting.

load more comments (1 replies)
load more comments (1 replies)
[–] [email protected] 6 points 3 weeks ago (1 children)

I thought Gigablast was a one-man company? Yet it had good search results and it was expansive.

[–] [email protected] 6 points 3 weeks ago

Yes, it was. Matt Wells closed it down just over one year ago.

[–] [email protected] 5 points 3 weeks ago* (last edited 3 weeks ago) (1 children)

YaCy is probably what you're looking for

[–] [email protected] 4 points 3 weeks ago

Yah, it does. I've come across it before, but it rode in on a wave of alternative search engines and got lost in the shuffle.

Thanks.

[–] [email protected] 22 points 3 weeks ago (1 children)

You're looking for Kagi.com

Not only does it give better search results quality wise on "the big web" - you can select to search specific parts, like blogs.

Best part - it's completely ad and spam free. You pay for it with actual money instead of with your data.

[–] [email protected] 18 points 3 weeks ago* (last edited 3 weeks ago) (3 children)

Why not run an SearXNG instance and help everyone instead? Y'know, Kagi is pretty expensive and they are also getting into AI shit.

[–] [email protected] 11 points 3 weeks ago (3 children)

I'm hoping just as Proton do good free stuff using money I pay them (Visionary account) Kagi does/will do the same. The Internet as a whole needs to stop being ad-supported.

[–] [email protected] 4 points 3 weeks ago* (last edited 3 weeks ago) (4 children)

I refuse to believe Proton when they do advertisements lol. They also are being pretty suspicious with ignoring XMR support since years of people requesting it. If they ever even considered it a bit, their new shit Proton Wallet wouldn't allow you to store (or only store) bitcoin, which we all know has nothing that protects your privacy.

load more comments (4 replies)
load more comments (2 replies)
[–] [email protected] 4 points 3 weeks ago (3 children)

I've signed up for the €5 a month subscription at kagi and I've never used my whole quota.

Granted I expect it's overly expensive if you live in a developing country like Eritrea or the United States

load more comments (3 replies)
[–] [email protected] 3 points 3 weeks ago (7 children)

Can you expand on how running your own SearXNG helps others? Does it contribute to some shared index or something?

load more comments (7 replies)
[–] [email protected] 20 points 3 weeks ago (1 children)

Google are the ones who have really gone down the toilet in recent years. They ditched cached pages, soured search results with paid ads and even their image search is as bad as Tineye for reverse image searching these days. Literally the only thing Alphabet really have going for them anymore is Android and YouTube.

It's baffling that a company which was once so dominant in the web search space that their name was literally used as a verb for looking things up for decades have now enshittified their flagship product so much that they're making rivals like Bing, Lycos, Duckduckgo, etc look like viable alternatives.

[–] [email protected] 9 points 3 weeks ago

Every company is going down the drain just at different speeds.

[–] LambdaRX 19 points 3 weeks ago* (last edited 3 weeks ago) (1 children)

Maybe try Mojeek, it uses completely independent indexing system.

[–] [email protected] 4 points 3 weeks ago (1 children)

Mojeek

Thanks for the rec, I'll give Mojeek a try for a while. So far the results seem better than Brave (which I didn't seriously consider using regularly anyway) but I miss the bang options (!w, !yt, etc.) that DDG has.

[–] [email protected] 3 points 3 weeks ago (2 children)

our Search Choices might be of use here, different implementation but similar: https://blog.mojeek.com/2022/02/search-choices-enable-freedom-to-seek.html

load more comments (2 replies)
[–] [email protected] 15 points 3 weeks ago

Offtopic but ddg is a bing frontend so they should share the same results.

[–] [email protected] 12 points 3 weeks ago

I'm building my own. Keep you posted.

[–] [email protected] 9 points 3 weeks ago (3 children)

The more obscure a web page is, the more likely it is to be indexed only by the large search engines (i.e. Google). There are search queries that return 0 results on DDG, but quite a few (relatively) obscure websites on Google. This is simply because the more money a search engine operator has, the more websites it will index.

So what you want is kind of contradictory.

load more comments (3 replies)
[–] [email protected] 9 points 3 weeks ago
[–] [email protected] 8 points 3 weeks ago* (last edited 3 weeks ago) (2 children)

Don't know if this fits your criteria, but I've been using Gruble a lot recently. You can personalise the look and language in the settings, plus it's open source.

[–] [email protected] 20 points 3 weeks ago* (last edited 3 weeks ago) (1 children)

For info: That's (just) a SearXNG instance. That's a metasearch engine, getting results from Google etc and proxying and aggregating them for you.

load more comments (1 replies)
[–] [email protected] 3 points 3 weeks ago

the link should be: https://gruble.de/. But as stated it's "just" a SearXNG instance. See the full list: https://searx.space/

[–] [email protected] 7 points 3 weeks ago (3 children)

If you want blogs, I recommend you use gemini: https://en.wikipedia.org/wiki/Gemini_(protocol)

Download Lagrange and begin browsing. It's basically a small-web of personal blogs.

load more comments (3 replies)
[–] [email protected] 3 points 3 weeks ago (1 children)

Before google existed I used https://www.metacrawler.com it appears to still be around. I have not used it in a long time, so I know nothing about it any longer.

[–] [email protected] 4 points 3 weeks ago (1 children)

https://system1.com/ adtech company syndicating Bing and/or Google

[–] [email protected] 5 points 3 weeks ago (1 children)

https://system1.com/ adtech company syndicating Bing and/or Google

They own metacrawler now?

[–] [email protected] 5 points 3 weeks ago (1 children)

yep, in footer "© 2024 Infospace Holdings LLC, A System1 Company"

load more comments (1 replies)
load more comments
view more: next ›