jerry

joined 2 years ago
MODERATOR OF
[–] [email protected] 1 points 1 week ago

No kicking on my side. I’ll see if there was anything going on in the logs yesterday.

[–] [email protected] 3 points 3 weeks ago

@[email protected] fedia.io was being swamped with crawlers from thousands of IPs causing the site to grind to a halt and periodically crash. I had to limit access to only logged in users while I try to sort out a better way to manage all those crawlers.

@[email protected]

 

I have some time to babysit the server now and so reenabled anonymous access. I've also removed the prior ASN blocks, but may add those back in as needed based on various AI datacenter crawling.

[–] [email protected] 4 points 1 month ago (1 children)

I understand. I have tried hard to make fedia.io work - it’s been far and away the most challenging app I’ve managed (note: the problems are all legacy kbin issues, the mbin team has been nothing but amazing). I am stuck in a difficult position - the site isn’t useful if I keep it locked down like it is now, and the site is super slow/requires constant attention if I make it open. I’ll have to assess my options and decide what the future for fedia is

[–] [email protected] 2 points 1 month ago (1 children)

Apologies for the delay, but this is fixed now

[–] [email protected] 2 points 1 month ago

Ohh - that is possible. I will check when I get back to my computer.

[–] [email protected] 10 points 1 month ago

I will add that to the donation page

[–] [email protected] 5 points 1 month ago

You and the mbin team continues to amaze me. Thank you so much!

[–] [email protected] 10 points 1 month ago (2 children)

It’s an application level ddos. Blocking anonymous access helped a bunch, but I am still getting about 5-10 login requests per second from hundreds of different IPs

[–] [email protected] 10 points 1 month ago

Thanks. Just trying to give people some alternatives

[–] [email protected] 10 points 1 month ago

We think it’s a csrf prevention measure in the php symphony library that creates a lot of database calls.

[–] [email protected] 7 points 1 month ago (1 children)

Not really. We have to accept incoming connections from thousands of other fediverse instances that would be blocked by that.

 

Hi all. Fedia.io has for a long time been subject to ddos attacks, including many that are "accidental", caused by myriad scrapers constantly hammering the site. I gave up on trying to play whack-a-mole with blocking them based on IP address (they do not honor robots.txt and do not use a conspicuous user agent string) since I was inadvertently blocking some legitimate users. So, I've restricted access to the content of fedia.io to only those that are logged in. That will mean we don't show up in search engines and whatnot, which for some will considered a good thing and will likely cause others to leave.

There is a remaining problem related to the login form. Calls to the login page are breathtakingly expensive, computationally speaking, and so I also have a script that monitors unusual numbers of calls to that form and blocks at the firewall any offenders. I strongly suspect I'm catching some legitimate users with this too, and so I continue to try to tune it, but it's maddening, y'all.

These issues have been causing performance problems for everyone (despite the fedia.io app running on a dedicated 96 core, 256GB server with nvme disks), and became unavailable for certain people that accidentally tripped various thresholds. I'm hoping most of this is resolved now.

Thanks for the patience.

 

My apologies for the recent spate of problems. I think I’ve narrowed the problem down to the /m/fediverse and /m/random magazines. For some reason, mbin is generating an enormous amount of outbound delivery messages for these two magazines. I first tried removing the hashtags from /m/fedivese, but that was only a quick fix. So I deleted the magazine. (Note, the notifications appear to be related to the “microblog” function, and were originating from accounts on lots of mastodon instances, so I think there is a bug somewhere).

I noticed /m/random doing something similar. I have removed all the subscribers from that magazine to try to reduce the number of notifications it is sending. I don’t know if that will help - I have a feeling the instance can’t keep up with that happening in both random and fediverse.

Anyhow, the queues are draining fast now. I purged about 600000 queued delivery messages that (based on a random sample) all appeared to be associated with fediverse and random. That should let the rest of whatever is backed up get moving again. and hopefully stay moving.

 

The following instances will be offline briefly on Saturday, December 14 from 9am ET / 2pm UTC for approxmately 10 minutes: infosec.exchange infosec.town infosec.pub pixel.infosec.exchange books.infosec.exchange matrix/element.infosec.exchange relay.infosec.exchange meetup.infosec.exchange video.infosec.exchange infosec.press infosec.place fedia.io fedia.social elk,.infosec.exchange infosec.space convo.casa

The servers supporting these instances require a reboot. The Dell servers these instances run on take a very long time to boot, so I am estimating 10 minutes of downtime. It could be more, could be less.

We use live patches to minimize reboots needed for patching, however Ubuntu only provides livepatch support for a year, which is how long most of these systems have been running for.

 

It’s been a long day. I will fix it when I am back in front of a computer. It might be a few hours. My apologies.

 

I have sort of given up in fixing the problem, and will instead work on auto-detecting and auto-recovering when the problem happens.

 

I just saw this: https://every.to/p/the-disappearance-of-an-internet-domain

I have no idea if it's real, but if it is, that will be most unfortunate

 

After I resolved the federation issue, I had to clean up a few things and so the site may have been unavailable for a bit. I'm done fussing with it and will keep an eye on it to make sure things are working.

IF YOU SEE PROBLEMS - please let me know. As far as I know, I've fixed all of the federation and error 500 issues we've had, so please don't assume it's just more of the same if you see them.

Thanks for your patience.

 

Fedia.io is sort of like she Ship of Theseus right now - I literally replaced nearly everything trying to get it back working.

The problem ended up being a silent out of memory error that php-fpm was running into. I had to increase the memory limit to about 10x what the docs require to get it to work, but once I did that, it works great.

I was only able to sort this out after @bentigorlich recommended I move the site to debug mode (which requires me to lock everyone else out). Once I did that, it started giving some useful errors.

My apologies for the amount of time it took to fix this. I learned a lot about php today.

 

Unfortunately outbound federation - making posts on communities/magazines on other instances is broken. I do not yet know the cause or have an idea on when it will be fixed.

 

Hi all. As some of you have reported, outbound federation to at least some other instances is broken from fedia.io. At the moment. I don't know why and I don't have any leads as there are no logs or other indications of what is going wrong, but I am working on it.

 

Hi all. Several of you have reported problems with fedia.io not federating with other instances correctly.

The cause is that rabbitmq crashed, but not all the way. It crashed to the point where new connections would timeout, but the service was still running such that it wouldn't auto restart. I will be creating some automation to detect that proactively and restart rabbitmq if/when it happens again.

view more: next ›