this post was submitted on 15 Sep 2023
16 points (100.0% liked)

techsupport

2496 readers
3 users here now

The Lemmy community will help you with your tech problems and questions about anything here. Do not be shy, we will try to help you.

If something works or if you find a solution to your problem let us know it will be greatly apreciated.

Rules: instance rules + stay on topic

Partnered communities:

You Should Know

Reddit

Software gore

Recommendations

founded 2 years ago
MODERATORS
 

I have assembled my desktop PC about 2 years ago. It's fairly beefy (AMD Ryzen 9 3950X 16-Core Processor, 128Go RAM, nVidia RTX 3080 Ti). It's running debian stable.

Once in a while (not that often, but like every 2 weeks or so), seemingly at random times, not especially under heavy loads, the system crash and freeze, irresponsive to even the linux sysrq magic keys. I never manage to find what was the cause. One interesting fact is that when it happens, for some reason it seems to "freeze my network" too, ie, other (ethernet) devices on my local network have no connectivity anymore. They're all connected to the same router, but not through this crashing PC. Connectivity comes back as soon as I force shutdown the crashing PC.

What can cause this and how could I fix these freezes?

all 26 comments
sorted by: hot top controversial new old
[–] [email protected] 3 points 1 year ago

I can only offer some additional troubleshooting steps.

  1. Your network connection is fairly simple so I would suggest you take NM (NetworkManager) out of the equation and setup your network device manually to see if that eliminates your issue. This goes back to the comment (@despotic_machine) and log listing the p2p and wireless interfaces. Seems like the NM may be trying to setup your wifi interfaces. Though looking at the log you provided, it seems NM sees the wireless interface, identifies that it is not connected, and sets it to inactive. So, there may not be an issue. I had issues with NM many years ago on a laptop and preferred wicd; however, it seems that development has stalled on wicd. Regardless, I do not run NetworkManager at all on my desktop (just isc-dhcp-client and entry in /etc/networks/interfaces) since it is not roaming (plugged into a switch). It seems you don't even need to uninstall anything, just setup the network manually and NM should leave the interface alone. If you want it to be clean, make sure NM is not running, or purge it from the system and setup your networking manually. The assumption of manual setup is based on the debian wiki:

https://wiki.debian.org/NetworkManager#Wired_Networks_are_Unmanaged

NOTE: Unless you know networking, this is probably going to take you down a networking rabbit hole, so glhf.

Some Debian references regarding networking and different configurations:
https://www.debian.org/doc/manuals/debian-reference/ch05.en.html
https://www.debian.org/doc/manuals/debian-handbook/sect.network-config

  1. If you want to stick with NM, it seems you can change the logging level to see if you get more details. I would check the man page or documentation for NM for instructions for debugging. I would expect that you can disable interfaces in NM to reduce the likelihood of some fringe case that is plaguing your setup. Since I don't run NM, I can't provide any detailed suggestions.

  2. More of a question but is the switch or router also the same device for the last 2 years? Is it possible that the network device is misbehaving and causing the desktop to lock up? This would feed into @0v0 request to wireshark/tcpdump from a laptop or other device connected to the router/switch to see what's going on traffic wise.

[–] [email protected] 3 points 1 year ago (1 children)

Have you tried running tcpdump / wireshark on another device in the network when this happened?

[–] nicocool84 2 points 1 year ago

Nope, I don't know the first thing about these tools, but now I'm kind of impatient and hope that the next freeze happens soon so I can try. :-)

[–] [email protected] 1 points 1 year ago (1 children)

Do you by any chance use flameshot?

[–] nicocool84 2 points 1 year ago (1 children)

Nope. May I ask what would be the connection with my issues?

[–] [email protected] 2 points 1 year ago

I had a similar issue which seemed to pop up anytime from an hour to a couple days after I used flameshot. It took me a long time to figure out what was triggering it. I stopped using flameshot and the freezes stopped. I've mentioned this a couple other times to people who ended up having the same problems and fix. But if you aren't using it, I don't have anything else to suggest.

[–] [email protected] 1 points 1 year ago (1 children)

Uninstall (I don't know how, on debian) NetworkManager and reinstall it (better get a .deb)

Then sudo systemctl enable NetworkManager.service

Reboot and hope for the best.

[–] nicocool84 1 points 1 year ago (2 children)

This has been happening for 2 years, with the previous debian version too, so I doubt this would do anything?

[–] [email protected] 1 points 1 year ago (1 children)

If its been happening for multiple years and os's, maybe your network card is dead/dying? Buy a new network card and see if that helps?

[–] nicocool84 2 points 1 year ago

Everything is 2 yo, so this would mean the mobo (well, the onboard ethernet thing) was malfunctioning from the start. Maybe!

I might try disabling and using the onboard wifi chip temporarily instead, just to see if I notice a new freeze. The issue is, I've never understood what triggers it, and it's quite rare (less than once a week), so it's really annoying to debug…

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (1 children)

Have you been updating or reinstalling ?

Parce que si c'est update sur update ça pourrait venir de là. Dans ce cas réinstalle peut etre ?

[–] nicocool84 1 points 1 year ago* (last edited 1 year ago) (1 children)

Updating. I'm willing to try your solution but I am a little bit worried about not being able to reinstall anything after I sudo apt remove network-manager. Why would a package reinstallation help? Wouldn't resetting the config files be more efficient btw?

EDIT: Ce n'est pas update sur update, y a juste eu bullseye (d'abord testing, puis stable), puis récemment je suis passé à bookworm. Mais le soucis est là depuis le début. Il est pas trop chiant parce que c'est rare, mais quand même ça m'enquiquine.

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago)

Thing is, I really haven't used debian based distros for the better part of the last two years so I'm not sure how to reinstall it if something goes south. With arch you just have to do a pacstrap with a liveUSB.

So... it seems kinda dangerous if you don't have a backup .deb. I'm not sure I would advise you to go this way.

I looked at your journalctl. The error might come from your wireless card. If that is the case, and since you don't use it at all there is a simple trick : sudo systemctl disable wpa_supplicant then reboot.

It won't have any incidence on the ethernet but will somewhat disable your wifi card. (Not exactly but you get the gist of it).

If I'm right it should make all of your problems go away. It might be worth a try. And if it doesn't work a simple sudo systemctl enable wpa_supplicant will reverse it back to the way it was.

Ça demeure chiant, même si c'est pas quotidien.

[–] [email protected] 1 points 1 year ago

I have no idea, but it seems like interesting problem. Good luck finding a solution. (Just commenting to get notified of someone has a solution)

[–] [email protected] 1 points 1 year ago (1 children)

Is it possible that the freeze you're seeing on that machine is actually caused by a network failure, rather than the other way around?

I have encountered many times what appears to be a system freeze which is actually the result of background processes trying to access a network resource which no longer exists(eg, mounted a disk via VPN connection, but the VPN has dropped out)

[–] nicocool84 1 points 1 year ago

I think this is unlikely because it's only this specific device that crashes, and the others are fine?

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (1 children)

Check your system logs such as dmesg and journalctl immediately after the freeze (if it's still occurring). You could filter journalctl log to show, say the last 5 minutes since the last boot, like this:

journalctl --boot=-1 --since="5 min ago" --priority=0..3

[–] nicocool84 3 points 1 year ago* (last edited 1 year ago)

It happened yesterday, and here are the latest log lines before the freeze:

Sep 14 23:30:30 licorne NetworkManager[1291]:   [1694727030.1207] device (wlp4s0): set-hw-addr: set MAC address to CA:D0:86:5F:F9:85 (scanning)
Sep 14 23:30:30 licorne NetworkManager[1291]:   [1694727030.1478] device (wlp4s0): supplicant interface state: inactive -> disconnected
Sep 14 23:30:30 licorne NetworkManager[1291]:   [1694727030.1478] device (p2p-dev-wlp4s0): supplicant management interface state: inactive -> disconnected
Sep 14 23:30:30 licorne NetworkManager[1291]:   [1694727030.1530] device (wlp4s0): supplicant interface state: disconnected -> inactive
Sep 14 23:30:30 licorne NetworkManager[1291]:   [1694727030.1530] device (p2p-dev-wlp4s0): supplicant management interface state: disconnected -> inactive
Sep 14 23:30:58 licorne syncthing[3169286]: [VY2L4] INFO: Established secure connection to REDACTED1 at [::]:22000-192.168.0.14:22000/quic-client/TLS1.3-TLS_CHACHA20_POLY1305_SHA256/LAN-P20
Sep 14 23:30:58 licorne syncthing[3169286]: [VY2L4] INFO: Device REDACTED1 client is "syncthing v1.23.4" named "REDACTED2.lan" at [::]:22000-192.168.0.14:22000/quic-client/TLS1.3-TLS_CHACHA20_POLY1305_SHA256/LAN-P20
Sep 14 23:31:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
Sep 14 23:31:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
Sep 14 23:31:11 licorne syncthing[3169286]: [VY2L4] INFO: Established secure connection to REDACTED1 at 192.168.0.98:22000-192.168.0.14:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10
Sep 14 23:31:11 licorne syncthing[3169286]: [VY2L4] INFO: Replacing old connection [::]:22000-192.168.0.14:22000/quic-client/TLS1.3-TLS_CHACHA20_POLY1305_SHA256/LAN-P20 with 192.168.0.98:22000-192.168.0.14:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10 for REDACTED1
Sep 14 23:31:11 licorne syncthing[3169286]: [VY2L4] INFO: Connection to REDACTED1 at [::]:22000-192.168.0.14:22000/quic-client/TLS1.3-TLS_CHACHA20_POLY1305_SHA256/LAN-P20 closed: replacing connection
Sep 14 23:31:11 licorne syncthing[3169286]: [VY2L4] INFO: Device REDACTED1 client is "syncthing v1.23.4" named "REDACTED2.lan" at 192.168.0.98:22000-192.168.0.14:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10
Sep 14 23:32:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
Sep 14 23:32:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
Sep 14 23:33:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
Sep 14 23:33:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
Sep 14 23:33:28 licorne systemd[1]: Started anacron.service - Run anacron jobs.
Sep 14 23:33:28 licorne anacron[4171587]: Anacron 2.3 started on 2023-09-14
Sep 14 23:33:28 licorne anacron[4171587]: Normal exit (0 jobs run)
Sep 14 23:33:28 licorne systemd[1]: anacron.service: Deactivated successfully.
Sep 14 23:34:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
Sep 14 23:34:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
Sep 14 23:35:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
Sep 14 23:35:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
Sep 14 23:36:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
Sep 14 23:36:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
Sep 14 23:37:04 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
Sep 14 23:37:04 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
Sep 14 23:37:25 licorne NetworkManager[1291]:   [1694727445.1045] device (wlp4s0): set-hw-addr: set MAC address to EE:65:E2:6E:73:D1 (scanning)
Sep 14 23:38:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
Sep 14 23:38:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.

[–] [email protected] 1 points 1 year ago (1 children)

Anything interesting in your logs?