this post was submitted on 30 Oct 2024
329 points (98.0% liked)
Privacy
32130 readers
1041 users here now
A place to discuss privacy and freedom in the digital world.
Privacy has become a very important issue in modern society, with companies and governments constantly abusing their power, more and more people are waking up to the importance of digital privacy.
In this community everyone is welcome to post links and discuss topics related to privacy.
Some Rules
- Posting a link to a website containing tracking isn't great, if contents of the website are behind a paywall maybe copy them into the post
- Don't promote proprietary software
- Try to keep things on topic
- If you have a question, please try searching for previous discussions, maybe it has already been answered
- Reposts are fine, but should have at least a couple of weeks in between so that the post can reach a new audience
- Be nice :)
Related communities
much thanks to @gary_host_laptop for the logo design :)
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Do you have topics that are censored? I searched for my reddit post "what I've learnt from the mantis aliens", and it does not show up in your results. Neither at google's. But it does on other search engines. The ufo/alien stuff are censored in most search engines, while there isn't a reason to be. That is how I judge search engines. And Mojeek doesn't give me the results I asked for.
Reddit doesn't allow us to crawl: https://www.reddit.com/robots.txt
Is that legally binding? What happens of they catch you, ban your IPs then you’re in the same situation as now. Literally no reason to not do it IMO.
IP already hits a wall, also better to not get a reputation as a bad bot, it's taken a while to get known for being friendly and respecting rules, to us you should follow robots
I seem to recall creative ways to index things without robots, e.g. browser addon that users opt into to send pages and such, essentially crowdsourcing the indexing. Anyways good to see you’re taking the high road!
our preference is always to find out why the block is happening and try to convince people it should be otherwise; widespread abuse of robots.txt does no-one any good, having been crawling and indexing for so long it's a standard that we understand and are quite fond of
we can see some of the perils and pitfalls of it too, but web builders need to be given some tools and assurances that those tools will work for them
That makes sense. One thing I’ve noticed with Mojeek search results compared to Google is that I do not encounter the “old web” any more on Mojeek than on Google. Are you not crawling/indexing web 1.0 blogs and sites at all?
probably a sample size issue, we crawl and index everything we are able to; have seen many of this kind of site in the past, and finding them is something that other people have said they enjoy about mojeek