this post was submitted on 06 Nov 2023
29 points (93.9% liked)

Selfhosted

38768 readers
371 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

I've been running my most recent Server built for quite some time now. I think Uptime was somewhere around 5 Months. Absolutely flawless. A few Days ago i started to have issues. Hard-Locks, Freezing...but absolutely zero log entries. Nothing. The Server was built with "off the shelf" Hardware and no ECC (even though the Ryzen CPU technically supports it, at the time ECC 3200 MHz Memory was still a lot more expensive than it is now) and is running a ZFS. Risky business, but it's "just" a home server. Would never built a server running mission critical stuff like that (and I've been doing that for over 10 years now as my main job). Over the last few weeks, i've been trying some stuff and had a pretty high memory load.

In any case, i also like Astrophysics and have some newsletters about Auroras and so on. They are extremely rare, here in southern Germany to occur. Yesterday we had one of the biggest and brightest I've ever seen.

But it got me thinking about my hard locks and crashes and i remembered, i had an account for ESA's SSCC (SSA Space Weather Coordination Centre). They have something called "Post-Event Analysis", where you can correlate certain timestamps to real time data, for example from DSCOVR ("THE" Space Weather Satellite).

For Auroras to occur, the so called "Bz-Value" is important. Basically, it tells the direction of the interplanetary magnetic field. If it's direction is towards the sun and towards the charged particles the sun throws at us, they get deflected. If it's with the direction of the solar wind, the particles "come in" and produce auroras...because the charged particles charge other parts - they generally charge oxygen, which results in green auroras - they also can do all sorts of stuff (and that's why spaceships, sats and other stuff floating around in space need shielding). The Value is measured in nanoTesla(nT).

There's also the Kp-Index...which was 7-8, out of 9.

So yeah - i'm pretty sure, i experienced a Single-Event Upset/Bit-Flip. Amazing stuff!

Edit: Picture of the Aurora https://i.imgur.com/TIxketJ.jpg

top 10 comments
sorted by: hot top controversial new old
[–] [email protected] 11 points 9 months ago

That is amazing! Now, I need to see about using weather satellites to explain the bugs in my code at work...

[–] [email protected] 4 points 9 months ago (2 children)

Shouldn't ZFS have detected the bad data and repaired itself from redundancy though?

[–] [email protected] 3 points 9 months ago (2 children)
[–] [email protected] 1 points 9 months ago

Oh! I thought OP was referencing OS files from the drive.

[–] [email protected] 1 points 9 months ago (1 children)

It also wouldn't cause Hard-Locks and Freezes without any errors

[–] [email protected] 5 points 9 months ago (1 children)

It certainly could. A bit-flip in a core part of the kernel could easily cause it to lock up, if an address is corrupted and it starts writing garbage over its code, or execution jumps to somewhere unexpected, or an instruction is changed from something reasonable to a halt.

Yes, most of those should trigger a blue screen or kernel panic, but that's not guaranteed when you're making completely random changes.

[–] [email protected] 1 points 9 months ago (1 children)

Sure - i should have mentioned, that the system itself runs not on the ZFS but from it's own SSD. So a "ZFS Cache in Memory Bit-Flip" should (theoretically...) not cause a hard-lock/freeze. It would probably trigger a complete garbage collection though.

And yes - that's what was so confusing to me, no kernel panic, no log entry...nothing, just a sudden, random freeze.

[–] [email protected] 3 points 9 months ago (1 children)

Right, a bit flip in ZFS cache shouldn't cause that. But a bit flip in active memory could.

[–] [email protected] 1 points 9 months ago

Absolutely! And I think that's actually what happened :)

[–] [email protected] 2 points 9 months ago

It probably did - but that's not why the server crashed :)