this post was submitted on 15 Jun 2023
11 points (100.0% liked)

Selfhosted

40458 readers
277 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

I've been having this issue very sporadically (sometimes a couple times a week, sometimes once a month). I'm curious as to how the more veteran folk here would try and narrow down the cause of this issue.

I can provide more info if needed!

Edit: More Info:

  • Using a static IP (no DHCP) through Netplan.
top 31 comments
sorted by: hot top controversial new old
[–] [email protected] 3 points 1 year ago (1 children)

Check if there's some weirdness in IP allocation. A reboot can cause DHCP to give it a new one that works as long as it does, then fail on some weird collision.

[–] [email protected] 1 points 1 year ago (2 children)

I forgot to mention that I'm not using DHCP, statically allocating a local IP

[–] [email protected] 4 points 1 year ago (1 children)

There might still be another device on the network that is using DHCP and is getting an IP that conflicts. Do you have any visibility on the rest of the network, IP addressing, DHCP leases etc? I would check there first for a potential easy fix.

[–] [email protected] 1 points 1 year ago

Hey thanks for the suggestion. It's the only client on this modem/router. I have a dedicated internet for it. So there shouldn't be any DHCP conflicts. I will double check to see if anything else is on that network and double-check the range of assignable IPs

[–] [email protected] 3 points 1 year ago (1 children)

Is there a DHCP server at play? Is the static IP outside of the DHCP range? This does sound like a typical IP collision.

[–] [email protected] 3 points 1 year ago* (last edited 1 year ago) (1 children)

DHCP is enabled on the router, but I believe the IP address is outside the designated DHCP range.

I'll double check when I'm home!

Edit: I will also say that this modem/router is dedicated only for the server, so there shouldnt be any other clients on it at all.

[–] [email protected] 3 points 1 year ago (1 children)

This might not be applicable to your use case, but maybe it helps.

Couple of years ago I had a problem where ONE windows laptop was unable to access the internet. Sometimes it would work right away, sometimes it took 1 or 2 reboots, sometimes the damn thing wouldn't budge.

lo and behold, it turns out the windows laptop was assigned a DHCP address that one linksys router had as a static ip. Why that resulted in a sporadic error and not a constant one I'll never know.

So next time you have this issue, rip out the network cable from the server and try to ping the ip the server is supposed to have.

Other than that, check the journal if something start to pop up around the time you experience the problem.

[–] [email protected] 2 points 1 year ago (1 children)

Thanks for the suggestion. So I have the static IP assigned with DHCP disabled both through Netplan, not through the router.

I'll remember to check the Netplann (?) journal/logs around that time, or are you referring to dmesg?

[–] [email protected] 2 points 1 year ago (1 children)

Since you're not really sure what the issue is, check all the logfiles around the time the problem starts. maybe you'll see a service stopping or starting.

[–] [email protected] 2 points 1 year ago (3 children)

Thank you I'll do that! It's hard to catch exactly when it happens. I think I need to get some monitoring and alert services up and running

[–] [email protected] 1 points 1 year ago

Easiest route you could go is setup a systemd timer which runs every 5 mins, pings an ip and write the result into a logfile. that way you have a timestamp for the problem start without going all out with monitoring.

Good luck!

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (1 children)

Changedetection.io it can send you an email or message when your server fails to ping it.you will then have the times. Its a 5 minute job to set it up. Make an account and your email or number or whatever and make a curl request to the specific endpoint in a cronjob

[–] [email protected] 1 points 1 year ago

Thanks I'll give that a shot! I was thinking about using a solution with my VPS, but I may go this route.

[–] [email protected] 1 points 1 year ago (1 children)

You don't need to catch that moment live, it was already recorded.

Take a look at journalctl -b -1 (previous boot).

[–] [email protected] 2 points 1 year ago

Thank you for this sweet tip! I'll definitely be using this.

[–] [email protected] 2 points 1 year ago (1 children)

Are you able to access the server's Linux shell via KVM or out-of-band management during the Internet outage? If so, I would first check the kernel log for any errors. I'd then follow up with a PING to the local gateway within the same IP subnet to validate Ethernet connectivity. After that I'd start performing PINGs and traceroutes to subnets beyond the local gateway to see what's going on with tcpdump capturing all interface traffic in the background.

[–] [email protected] 2 points 1 year ago (1 children)

I am able to access the Linux shell locally or on the local area network. From within I can ping the gateway, but nothing beyond that.

Is there a tracert equivalent for Ubuntu?

I will say I get this error when trying to ping www.google.com:

ping: unknown host

[–] [email protected] 2 points 1 year ago (1 children)

The tracert equivalent that I use would be traceroute or mtr. If you can get a response from the gateway and access the server locally, it would indicate an IP routing issue as local subnet traffic is working.

Instead of pinging a domain name, what happens if you ping a public IP like Google DNS (8.8.8.8)? The error you're seeing could be related to a DNS resolution issue although that should not affect access to your server.

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago)

Ill try that next time it hiccups. I was thinking I should have pinged an IP address after I was pasting the unknown host error 🙃.

In further hindsight I shouldn't have restarted the server so I could try the things during our conversation. 🙁

[–] [email protected] 1 points 1 year ago (1 children)

Can you pinpoint the exact moment when the internet cuts out? Have you checked your logs (dmesg, /var/log/syslog, journalctl, etc) around that time for anything weird?

[–] [email protected] 2 points 1 year ago

Unfortunately not. It's really exposing my need for a health monitoring service like TIG stack or something equivalent.

[–] [email protected] 1 points 1 year ago (1 children)

check that nameservers are specified in /etc/resolv.conf

[–] [email protected] 1 points 1 year ago (1 children)

Good idea. I'll check that

[–] [email protected] 1 points 1 year ago (1 children)

if ping fails with the name but works with 8.8.8.8 then you're losing DNS. set the nameservers locally on the server if you're using a static IP.

[–] [email protected] 1 points 1 year ago (1 children)

I do have nameservers (8.8.8.8 being one of them) set in my Netplan config, if that counts.

[–] [email protected] 2 points 1 year ago (1 children)

it should - I think (I am not familiar with netplan) - but from what I can tell it it's a utility that simplifies the local config tasks, when you apply it, I think it should be putting the nameservers in /etc/resolv.conf - but before you go down this rabbit hole, check whether 8.8.8.8 or 1.1.1.1 respond to a ping when pings to named servers don't - if so, it's definitely name resolution (which would be my first guess).

[–] [email protected] 1 points 1 year ago

Thank you, I'll follow this advice next time it decides to cut the internet off.

[–] [email protected] 1 points 1 year ago (1 children)

Honestly I would replace Ubuntu with an actual operating system designed for servers.

[–] [email protected] 1 points 1 year ago

tbf i'm using ubuntu server because I've been using it on and off for a decade. Overall I've had no major issues. And its long-term support is one of the longest.

There's the thought of going RHEL-based which I have experience with from work too, so that is an option in the long run.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (1 children)

I have a TP-Link HS300 that connects to a AP running openwrt, at any moment I can wireguard to AP and use Kasa APP to turn on/off any machine (not the AP though).

There is a Python package for kasa so one can use openwrt machine to detect if machine is down and automatically switch on and off if needed.

[–] [email protected] 1 points 1 year ago

I was thinking of something like a local script to reboot the PC once internet isn't reachable after 5 min or so, but only as a stopgap/workaround until I figure out the issue.

load more comments
view more: next ›