this post was submitted on 28 Dec 2023
35 points (94.9% liked)

Selfhosted

40394 readers
382 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
35
submitted 11 months ago* (last edited 11 months ago) by [email protected] to c/[email protected]
 

Hey fellow Selfhosters! I need some help, I think, and searching isn't yielding what I'm hoping for.

I recently built a new NAS for my network with 4x 18TB drives in a ZFS raidz1 pool. I previously have been using an external USB 12TB harddrive attached to a different machine.

I've been attempting to use rsync to get the 12TB drive copied over to the new pool and things go great for the first 30-45 minutes. At that point, the current copy speed diminishes and 4 current files in progress sit at 100% done. Eventually, I've had to reboot the machine, because the zpool doesn't appear accessible any longer. After reboot, the pool appears fine, no faults, and I can resume rsync for a while.

EDIT: Of note, the rsync process seems to stall and I can't get it to respect SIGINT or Ctrl+C. I can SSH in separately and running zpool status hangs with no output.

While the workaround seems to be partially successful, the point of using rsync is to make it fairly hands-free and it's been a week long process to copy the 3TB that I have now. I don't think my zpool should be disappearing like that! Makes me nervous about the long-term viability. I don't think I'm ready to drop down on Unraid.

rsync is being initiated from the NAS to copy from the old server, am I better off "pushing" than "pulling"? I can't imagine it'd make much difference.

Could my drives be bad? How could I tell? They're attached to a 10 port SATA card, could that be defective? How would I tell?

Thanks for any help! I've dabbled in linux for a long time, but I'm far from proficient, so I don't really know the intricacies of dmesg et al.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 2 points 11 months ago (1 children)

When things lock up, will a kill -9 kill rsync or not? If it doesn't, and the zpool status lockup is suspicious, it means things are stuck inside a system call. I've seen all sorts of horrible things with usb timeouts. Check your syslog.

[–] [email protected] 1 points 10 months ago

kill -9

Just tested, thanks for the suggestion! It killed a few instances of rsync, but there are two apparently stuck open. I issued reboot and the system seemed to hang while waiting for rsync to be killed and failed to unmount the zpool.

Syslog errors:

Dec 31 16:53:34 halnas kernel: [54537.789982] #PF: error_code(0x0002) - not-present page
Jan  1 12:57:19 halnas systemd[1]: Condition check resulted in Process error reports when automatic reporting is enabled (file watch) being skipped.
Jan  1 12:57:19 halnas systemd[1]: Condition check resulted in Process error reports when automatic reporting is enabled (timer based) being skipped.
Jan  1 12:57:19 halnas kernel: [    1.119609] pcieport 0000:00:1b.0: DPC: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 4, DL_ActiveErr+
Jan  1 12:57:19 halnas kernel: [    1.120020] pcieport 0000:00:1d.2: DPC: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 4, DL_ActiveErr+
Jan  1 12:57:19 halnas kernel: [    1.120315] pcieport 0000:00:1d.3: DPC: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 4, DL_ActiveErr+
Jan  1 22:59:08 halnas kernel: [    1.119415] pcieport 0000:00:1b.0: DPC: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 4, DL_ActiveErr+
Jan  1 22:59:08 halnas kernel: [    1.119814] pcieport 0000:00:1d.2: DPC: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 4, DL_ActiveErr+
Jan  1 22:59:08 halnas kernel: [    1.120112] pcieport 0000:00:1d.3: DPC: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 4, DL_ActiveErr+
Jan  1 22:59:08 halnas systemd[1]: Condition check resulted in Process error reports when automatic reporting is enabled (file watch) being skipped.
Jan  1 22:59:08 halnas systemd[1]: Condition check resulted in Process error reports when automatic reporting is enabled (timer based) being skipped.
Jan  2 02:23:18 halnas kernel: [12293.792282] gdbus[2809399]: segfault at 7ff71a8272e8 ip 00007ff7186f8045 sp 00007fffd5088de0 error 4 in libgio-2.0.so.0.7200.4[7ff718688000+111000]
Jan  2 02:23:22 halnas kernel: [12297.315463] unattended-upgr[2810494]: segfault at 7f4c1e8552e8 ip 00007f4c1c726045 sp 00007ffd1b866230 error 4 in libgio-2.0.so.0.7200.4[7f4c1c6b6000+111000]
Jan  2 03:46:29 halnas kernel: [17284.221594] #PF: error_code(0x0002) - not-present page
Jan  2 06:09:50 halnas kernel: [25885.115060] unattended-upgr[4109474]: segfault at 7faa356252e8 ip 00007faa334f6045 sp 00007ffefed011a0 error 4 in libgio-2.0.so.0.7200.4[7faa33486000+111000]
Jan  2 07:07:53 halnas kernel: [29368.241593] unattended-upgr[4109637]: segfault at 7f73f756c2e8 ip 00007f73f543d045 sp 00007ffc61f04ea0 error 4 in libgio-2.0.so.0.7200.4[7f73f53cd000+111000]
Jan  2 09:12:52 halnas kernel: [36867.632220] pool-fwupdmgr[4109819]: segfault at 7fcf244832e8 ip 00007fcf22354045 sp 00007fcf1dc00770 error 4 in libgio-2.0.so.0.7200.4[7fcf222e4000+111000]
Jan  2 12:37:50 halnas kernel: [49165.218100] #PF: error_code(0x0002) - not-present page
Jan  2 19:57:53 halnas kernel: [75568.443218] unattended-upgr[4110958]: segfault at 7fc4cab112e8 ip 00007fc4c89e2045 sp 00007fffb4ae2d90 error 4 in libgio-2.0.so.0.7200.4[7fc4c8972000+111000]
Jan  3 00:54:51 halnas snapd[1367]: stateengine.go:149: state ensure error: Post "https://api.snapcraft.io/v2/snaps/refresh": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)