this post was submitted on 12 Jul 2024
19 points (91.3% liked)

Selfhosted

40696 readers
305 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
19
submitted 5 months ago* (last edited 5 months ago) by HumanPerson to c/[email protected]
 

I am currently out of town, and my server went down. All my services go through nginx, and suddenly started giving error 502. My SSH won't let me in. I had my sister reboot the server, and it still doesn't work. I apologize for the lack of details, but that is all I know, and I can't access logs. I've cleared cache, and used a VPN in case fail2ban got me. I recently got a tp link router, so it could be something with that, but it was working for a while. I will have her do another reboot, and if that doesn't work I will have her power off and unplug the server in case it was hacked.

Edit: I have absolutely no clue why, but it works now. I literally did nothing. As far as I know, my sister hasn't touched it today. It just started working. Computers, man...

Edit 2: Actually she said she did something. Not sure what, but it works now.

top 28 comments
sorted by: hot top controversial new old
[–] [email protected] 10 points 5 months ago* (last edited 5 months ago) (2 children)

Some troubleshooting thoughts:

What do you mean when you say SSH is "down":

  1. connection refused (fail2ban's activity could result in a connection refused, but a VPN should have avoided that problem, as you said)
  2. connection timeout. probably a failure at the port forwarding level.
  3. connection succeeded but closed; this can happen for a few reasons, such as the system is in an early boot up state. there's usually a message in this case.
  4. connection succeeded but auth rejected. this can happen if your os failed to boot but came up in a fallback state of some kind.

Knowing which one of these it is can give you a lot more information about what's wrong:

System can't get past initial boot = Maybe your NAS is unplugged? Maybe your home DNS cache is down?

Connection refused = either fail2ban or possibly your home IP has moved and you're trying to connect to somebody else's computer? (nginx is very popular after all, it's not impossible somebody else at your ISP has it running). This can also be a port forwarding failure = something's wrong with your router.

Connection succeeded + closed is similar to "can't get past initial boot"

Auth rejected might give you a fallback option if you can figure out a default username/password, although you should hope that's not the case because it means anyone else can also get in when your system is in fallback.

Very few of these things are actually fixable remotely, btw. I suggest having your sister unplug everything related to your setup, one device at a time. Internet router, raspberry pi, NAS, your VM host, etc. Make sure to give them a minute to cool down. Hardware, particularly cheap hardware, tends to fail when it gets hot, and this can take a while to happen, and, well, it's been hot.

Here's a few things with a high likelihood of failing when you're away from home:

  • heat, as previously mentioned.
  • running out of disk space. Maybe you're logging too much, throw some more disk in there and tune down the logging. This can definitely affect SSH, and definitely won't be fixed by a reboot.
  • OOM failures (or other resource leaks). This isn't likely to affect your bare metal ssh, but it could. Some things leak memory, and this can lead to cascading process destruction by the OS. In this scenario you'd probably be able to connect to things in the first few minutes after a reboot, though.
  • shitty cabling. Sometimes stuff just falls out of the socket, if it wasn't plugged in perfectly to begin with. (Heat can also contribute to this one.)
  • reliance on a cloud service that's currently down. (This can include: you didn't pay the bill.) Hopefully your OS boot doesn't fail due to a cloud service, but I've definitely seen setups that could.
[–] [email protected] 4 points 5 months ago (1 children)

running out of disk space

This would be my first guess. Nothing shuts down arbitrary services quite like a full /var/logs.

[–] HumanPerson 1 points 5 months ago

I've got a 1tb boot drive and it isn't used for much, but stuff happens, so... idk.

[–] HumanPerson 1 points 5 months ago* (last edited 5 months ago) (2 children)

It says connection closed. There is no message beyond that. I think it is likely that it is failing to boot. I might video call my sister and have her try to boot it so I can see any errors.

Edit: Also, thanks very much for your response. It was very detailed and informative.

[–] [email protected] 3 points 5 months ago

Connection closed means somebody is listening to the port and failing/not willing to reply.

Unless some network middlemen is closing your connection (ssh should be on port > 1024 to be safe from ISP throttling), your ssh server is severely strained (oom, disk full...) or your F2B is kicking in.

[–] [email protected] 1 points 5 months ago
[–] [email protected] 3 points 5 months ago

If it's working again all of the sudden I would lean towards f2b. I don't know what your "timeout" is, but if f2b got tripped it would explain why you couldn't get in yesterday but today it works (assuming your ban expires in 24hrs or so).

[–] [email protected] 3 points 5 months ago (1 children)

That sucks dude. Not much you can do about it remotely.

[–] HumanPerson 1 points 5 months ago

My sister is there, but I can't do much diagnosis. It is weird that SSH would go down with it though, so I thought someone might have an idea.

[–] [email protected] 2 points 5 months ago (1 children)
[–] HumanPerson 1 points 5 months ago (1 children)

I know it's bad gateway. I just don't know what caused it, or why it happened when SSH went down. Thanks, though.

[–] [email protected] -3 points 5 months ago (1 children)

If ssh is down, and your proxy can't talk to that same machine, then................

[–] HumanPerson 1 points 5 months ago (1 children)

Proxy is on the same machine though. I just use it for subdomains and rate limiting.

[–] [email protected] -3 points 5 months ago (1 children)

So then....................

[–] HumanPerson 1 points 5 months ago (1 children)

So then.................... maybe try being direct with your answer.

[–] [email protected] -2 points 5 months ago* (last edited 5 months ago) (1 children)

I'm being direct. If your host isn't answering it is.......down

[–] HumanPerson 0 points 5 months ago* (last edited 5 months ago) (1 children)

But it isn't. It sends me an nginx error. The nginx is on that server, so that server isn't completely down.

[–] [email protected] -5 points 5 months ago

So.........................?

[–] [email protected] 2 points 5 months ago (1 children)

502 means the app is broken. For example, if it were Flask python, it would be raising an exception (e.g. divide by zero). If this is happening to many services or apps simultaneously, it is concerning. Turning it off sounds wise at this point.

[–] HumanPerson 1 points 5 months ago

Yeah, I would think docker is broken, but that wouldn't explain the SSH, which is bare metal and doesn't go through nginx.

[–] [email protected] 2 points 5 months ago* (last edited 5 months ago) (1 children)

Does your router have an app or way of letting you remotely see if the server is even showing on your home network?? It could be a physical disconnect or Ethernet port failure, or NIC failure maybe? A reboot wouldn't help if the issue was related to something like that.

Edit: Actually, re-read your post and thinking about this again, what I said wouldn't make sense...

You could have some sort of corruption causing an error in the appdata, preventing it from running. Might be a RAM issue.

[–] HumanPerson 2 points 5 months ago (1 children)

It has a network connection, I am able to get to the nginx error, the services themselves are down. What's really weird is everything is down, even SSH.

[–] [email protected] 1 points 5 months ago* (last edited 5 months ago) (1 children)

I edited my original post right when you replied, my bad.

I dunno if you can do that much remotely, honestly. I kinda feel like something might have corrupted? What kinda system are you using? Any more details you can provide?

[–] HumanPerson 1 points 5 months ago

I don't think it is a hardware issue. I have decent hardware that's fairly new. I unfortunately can't say much, though another commenter let me know the SSH failure message is relevant. It see connection closed, which means that it is probably failing to boot. I think an update or something may have broken it, though it is debian stable, so Idk. I'm going to try to call my sister and see if I can get a picture of an error message or something.

[–] [email protected] 1 points 5 months ago* (last edited 5 months ago)

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I've seen in this thread:

Fewer Letters More Letters
DNS Domain Name Service/System
HTTP Hypertext Transfer Protocol, the Web
IP Internet Protocol
NAS Network-Attached Storage
SSH Secure Shell for remote terminal access
VPN Virtual Private Network
nginx Popular HTTP server

6 acronyms in this thread; the most compressed thread commented on today has 11 acronyms.

[Thread #866 for this sub, first seen 12th Jul 2024, 22:25] [FAQ] [Full list] [Contact] [Source code]

[–] [email protected] 1 points 5 months ago* (last edited 5 months ago) (1 children)

Are you by chance using something like Cloudflare? It may be possible that during the reboot the static IP changed, so your “gateway” cannot reach your router on your old IP no more?

In other words : it’s always the DNS?

[–] HumanPerson 1 points 5 months ago

It's not the DNS. That was the first thing I checked. Also, I don't use cloudflare.

[–] [email protected] 1 points 5 months ago* (last edited 5 months ago)

if your sister's by your server in-person, maybe you could guide them to graphically install something like Rustdesk (edit: graphical remote access, wayland isn't well supported so make sure it's running over Xorg), give you the access code & have them manually accept the connection so you can get back in.

You'll be stuck streaming your terminal window and sending laggy keystrokes though whatever connection you have now (until you can get ssh running), but it's better than nothing.