Asklemmy

48833 readers

537 users here now

A loosely moderated place to ask open-ended questions

Search asklemmy 🔍

If your post meets the following criteria, it's welcome here!

Open-ended question
Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
Not ad nauseam inducing: please make sure it is a question that would be new to most members
An actual topic of discussion

Looking for support?

Looking for a community?

Lemmyverse: community search
sub.rehab: maps old subreddits to fediverse options, marks official as such
[email protected]: a community for finding communities

~Icon~ ~by~ ~@Double_[email protected]~

founded 6 years ago

MODERATORS

[email protected]

255

Search engines down? (discuss.tchncs.de)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]

67 comments fedilink hide all child comments

Is it just me or are many independent search engines down? Duckduckgo, my go to engine, qwant, ecosia, startpage... All down? The only hint I got was on the qwant page...

Edit: it all seems to be related to bing being down. I hope the independent engines will find a way to get really independent...

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 58 points 1 year ago (4 children)

Its about time we make a federated search engine and indexer

[–] [email protected] 35 points 1 year ago (1 children)

Isn’t that searx / searxng?

SearXNG is a fork from the well-known searx metasearch engine which was inspired by the Seeks project. It provides basic privacy by mixing your queries with searches on other platforms without storing search data. SearXNG can be added to your browser’s search bar; moreover, it can be set as the default search engine.

SearXNG appreciates your concern regarding logs, so take the code from the SearXNG sources and run it yourself!

Add your instance to this list of public instances to help other people reclaim their privacy and make the internet freer. The more decentralized the internet is, the more freedom we have!

[–] [email protected] 22 points 1 year ago* (last edited 1 year ago)

Not at all. This just searches multiple search engines at once and presents you with the results from all of them on a single page.

[–] [email protected] 16 points 1 year ago (1 children)

Smth like yaCy?

https://yacy.net/

[–] [email protected] 4 points 1 year ago* (last edited 1 year ago) (1 children)

Looks interesting why have i never heard of this. Are there any websites that rank search engines to get some concrete metrics? I had a look at what people have been saying about it seems it has poor search results unfortubatly.

[–] [email protected] 3 points 1 year ago* (last edited 1 year ago) (1 children)

idk about any ranking sites, but from what I understand, YaCy gets better the more ppl participate in it

other ppl already mentioned Searxng which is also great

[–] [email protected] 2 points 1 year ago

Thats what im currently using but its a meta search engine and still relies on everyone else's proprietary crap.

[–] [email protected] 12 points 1 year ago* (last edited 1 year ago) (2 children)

I was thinking about this and imagined the federated servers handling the index db, search algorithms, and search requests, but instead leverage each users browser/compute to do the actual web crawling/scraping/indexing; the server simply performing CRUD operations on the processed data from clients to index db. This approach would target the core reason why search engines fail (cost of scraping and processing billions of sites), reduce the costs to host a search server, and spread the expense across the user base.

It also may have the added benefit of hindering surveillance capitalism due to a sea of junk queries from every client, especially if it were making crawler requests from the same browser (obviously needs to be isolated from the users own data, extensions, queries, etc). The federated servers would also probably need to operate as lighthouses that orchestrate the domains and IP ranges to crawl, and efficiently distribute the workload to client machines.

[–] [email protected] 3 points 1 year ago

Shit man thats exactly the kind of implementation i was thinking about. Had the idea for a couple years now but now that the fediverse is starting to gain traction i think it's probably about time some code gets written. Unfortunatly due to CORS u cant just start serving people a js script that starts indexing in the background.

[–] [email protected] 2 points 1 year ago

The theory with crawling is it has discovery built into it, no? You follow outbound links and discover domains that way. So you need some seeds, but otherwise you discover based on what other people already know about.

To me the problem seems like a few submarines in a cave. They can each see a little bit of what’s around them, and then they can share maps. Like the minimum knowledge of the internet is one’s own explorations. As one browses the web, their sensors are storing everything they see. It also actively searches with other agents, automatically crawls on its own like active sensors on a submarine always mapping out the environment.

Then, in the presence of other friendly subs, you can trade information. So one’s own personal and small map of the internet can get merged and mixed with others to get a more and more complete version.

Obviously this can be automated and batched, but that’s sort of the analogy I see in the real world: multiple parties exploring an unknown/changing space and sharing their data to make a map.

[–] [email protected] 5 points 1 year ago

I've also thought about this, but I don't know what would be the costs to do such a thing. (I'm ignorant on the subject)