justpassingby

joined 2 years ago
[–] justpassingby 0 points 6 months ago (1 children)

Hi,

I’ve done some research on this myself and the answer is the USB controller. Specifically the way the USB controller “shares” bandwidth. It is not the way a sata controller or a pci lane deals with this. ZFS expects direct control of the disk to operate correctly and anything that gets in between the file system and the disk is a problem.

Thanks for sharing. I agree with you 100% and I think everybody commenting here does. The whole point of the thread however was to understand if/how you can identify the location of the problem without guessing. The reality is I got to the conclusion that people... don't. Like you said people know ZFS is fussy about how does he speaks with the disks and the minimum issue it has it throws a tantrum. So people just switch things until they work (or buy expensive motherboards with many ports). I don't like the idea of not knowing "why", so I will just add on my notes that for my specific usecase I cannot trust ZFS + OS (TrueNas scale) to use the USB disk for backups via ZFS send/recieve.

If you want a stable system give zfs direct access to your disks and accept it will damage zfs operations over time if you do not.

I would like to add that I am not trying to mirror my main disk with a usb one. I just wanted to copy the zfs snapshots on the usb drive once a day at midnight. ZFS is just (don't throw stones at me for this, it is just my opinon) too brittle to use it this way too. I mean when I am trying to clean/recover the pool it just refuses (and there is no one writing on it).

A better but still bad solution would be something like a USB to SATA enclosure. In this situation if you installed a couple disks in a mirror on the enclosure… They would be using a single USB port and the controller would at least keep the data on one lane instead of constantly switching.

In my case there was no switching however. It was a single nvme drive in a single usb line in an enclusure. It was a separate stripe to just recieve data once a day.

Regardless if you want to dive deeper you will need to do reading on USB controllers and bandwidth sharing.

Not without good logs or debugging tools.

I decided I cannot trust it so unfortunately I will take the usb enclosure with the nvme, format it with etx4 and use Kopia to backup the datasets there once a day. It is not what I wanted but it is the best I can get for now.

About better solutions for the my play-NAS in general, I am constrained with the ports I have. I (again personal choice - I understand people disagree with this) don't want to go SATA. Unfortunately, since I could not find any PCIe switch with ASM2812I (https://www.asmedia.com.tw/product/866yq74SPBqRdtgC/7c5YQ79xz8urEGr1) I am unable to get more from my m2 nvme pcie 3x4 (speed loss for me is not an issue, my main bottleneck is the network). It is interesting how you can find many more interesting attempt at it in the PIs ecosystem but not for mini PCs.

[–] justpassingby 1 points 6 months ago (1 children)

Hi.

There is one usb drive in an nvme enclosure without their own power supply. I know the brand and I can find the chipset however what I need to understand is the issue from the logs.

The error usb 2-4: Enable of device-initiated U1 failed. seems common for USB devices not working.

What does it point to and what to look for to understand it?

Thanks.

PS: Just for curiosity I did swap the enclosure days ago and the cable but had the same issue, so the error message is not specific to it. Also I was using this enclosure as the main disk for one of my PI with no issue, so power via USB or cable should not be the problem. Not that I want to use that as metric, I need data/logs from the OS.

[–] justpassingby 1 points 6 months ago (3 children)

Hi! Thanks for the pointers. Unfortunately dmesg and system logs where the first places I looked at, but I found nothing at the time. I tried it again now to give you the output of a zpool clear, you can obviously ignore the failed email attempt. journalctl:

Jun 07 08:06:24 truenas kernel: WARNING: Pool 'tank-02' has encountered an uncorrectable I/O failure and has been suspended.
Jun 07 08:06:24 truenas zed[799040]: eid=309 class=statechange pool='tank-02' vdev=xxx-xxx-xxx-xxx-xxx vdev_state=ONLINE
Jun 07 08:06:24 truenas zed[799049]: eid=310 class=statechange pool='tank-02' vdev=xxx-xxx-xxx-xxx-xxx vdev_state=FAULTED
Jun 07 08:06:24 truenas zed[799057]: eid=313 class=data pool='tank-02' priority=3 err=28 flags=0x20004000 bookmark=0:0:0:1
Jun 07 08:06:24 truenas zed[799058]: eid=311 class=vdev_clear pool='tank-02' vdev=xxx-xxx-xxx-xxx-xxx vdev_state=FAULTED
Jun 07 08:06:24 truenas zed[799067]: eid=312 class=data pool='tank-02' priority=3 err=28 flags=0x20004000 bookmark=0:62:0:0
Jun 07 08:06:24 truenas zed[799081]: eid=316 class=io_failure pool='tank-02'
Jun 07 08:06:24 truenas zed[799082]: eid=315 class=data pool='tank-02' priority=3 err=28 flags=0x20004000 bookmark=0:0:-1:0
Jun 07 08:06:24 truenas zed[799090]: eid=314 class=data pool='tank-02' priority=3 err=28 flags=0x20004000 bookmark=0:0:1:0
Jun 07 08:06:24 truenas find_alias_for_smtplib.py[799114]: sending mail to 
                                                           To: root
                                                           Subject: ZFS device fault for pool tank-02 on truenas
                                                           MIME-Version: 1.0
                                                           Content-Type: text/plain; charset="ANSI_X3.4-1968"
                                                           Content-
Jun 07 08:06:24 truenas find_alias_for_smtplib.py[799114]: No aliases found to send email to root
Jun 07 08:06:24 truenas zed[799144]: error: statechange-notify.sh: eid=310: mail exit=1

dmesg says even less.

I also tried to reboot the machine with the drive detached and then attach it at runtime while tailing dmesg and journalctl. Now, they are pretty verbose, so will only add here any interesting part (I didn't notice anything new however):

[...]
[  221.952569] usb 2-4: Enable of device-initiated U1 failed.
[  221.954164] usb 2-4: Enable of device-initiated U2 failed.
[  221.965756] usbcore: registered new interface driver usb-storage
[  221.983528] usb 2-4: Enable of device-initiated U1 failed.
[  221.983997] usb 2-4: Enable of device-initiated U2 failed.
[  221.987603] scsi host2: uas
[  221.987831] usbcore: registered new interface driver uas
[...]
[  222.040564] sd 2:0:0:0: Attached scsi generic sg1 type 0
[  222.049860] sd 2:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
[  222.051867] sd 2:0:0:0: [sdb] Write Protect is off
[  222.051879] sd 2:0:0:0: [sdb] Mode Sense: 37 00 00 08
[  222.056719] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  222.058407] sd 2:0:0:0: [sdb] Preferred minimum I/O size 512 bytes
[  222.058413] sd 2:0:0:0: [sdb] Optimal transfer size 33553920 bytes
[  222.252607]  sdb: sdb1
[  222.253015] sd 2:0:0:0: [sdb] Attached SCSI disk
[  234.935926] usb 2-4: USB disconnect, device number 2
[  234.983962] sd 2:0:0:0: [sdb] Synchronizing SCSI cache
[  235.227936] sd 2:0:0:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[...]

Thanks for the advices, it was worth another try. Anything more that comes to mind?

[–] justpassingby 3 points 6 months ago

Thank you! A new path to check :) I didn't find this in my search until now, so I added it on my documentation.


Unfortunately it doesn't tell me much, but I am really happy there is some more new info here. I can see some FAILED steps but it may be just connected to the fact it is a striped volume?

1717612906   spa.c:6623:spa_import(): spa_import: importing tank-02
1717612906   spa_misc.c:418:spa_load_note(): spa_load(tank-02, config trusted): LOADING
1717612906   vdev.c:161:vdev_dbgmsg(): disk vdev '/dev/disk/by-partuuid/xxx-xxx-xxx-xxx-xxxx': best uberblock found for spa tank-02. txg 6462
1717612906   spa_misc.c:418:spa_load_note(): spa_load(tank-02, config untrusted): using uberblock with txg=6462
1717612906   spa.c:8925:spa_async_request(): spa=tank-02 async request task=4
1717612906   spa_misc.c:404:spa_load_failed(): spa_load(tank-02, config trusted): FAILED: cannot open vdev tree after invalidating some vdevs
1717612906   spa_misc.c:418:spa_load_note(): spa_load(tank-02, config trusted): UNLOADING
1717612906   spa_misc.c:418:spa_load_note(): spa_load(tank-02, config trusted): spa_load_retry: rewind, max txg: 6461
1717612906   spa_misc.c:418:spa_load_note(): spa_load(tank-02, config trusted): LOADING
1717612907   spa_misc.c:418:spa_load_note(): spa_load(tank-02, config untrusted): vdev tree has 1 missing top-level vdevs.
1717612907   spa_misc.c:418:spa_load_note(): spa_load(tank-02, config untrusted): current settings allow for maximum 0 missing top-level vdevs at this stage.
1717612907   spa_misc.c:404:spa_load_failed(): spa_load(tank-02, config untrusted): FAILED: unable to open vdev tree [error=2]
1717612907   spa_misc.c:418:spa_load_note(): spa_load(tank-02, config untrusted): UNLOADING

It goes on and after a while:

1717614235   spa_misc.c:2311:spa_import_progress_set_notes_impl(): 'tank-02' Finished importing
1717614235   spa.c:8925:spa_async_request(): spa=tank-02 async request task=2048
1717614235   spa_misc.c:418:spa_load_note(): spa_load(tank-02, config trusted): LOADED
1717614235   metaslab.c:2445:metaslab_load_impl(): metaslab_load: txg 6464, spa tank-02, vdev_id 0, ms_id 95, smp_length 0, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 1362018 ms, loading_time 0 ms, ms_max_size 8589934592, max size error 8589934592, old_weight 840000000000001, new_weight 840000000000001

But I see no other issue otherwise. Any other new path/logs/ways I can query the system?

[–] justpassingby 1 points 6 months ago

Thanks. I am ok with accepting the fact USB storage with ZFS is unreliable. I am ok with not using it in real case scenarios. My point stands however in understanding what broke so I know what to look for and, should I be crazy enough to try something similar again in some use-cases, know what to alert on. Call me curious. Everybody tells me it breaks, nobody tells me "look, it breaks here, and this is how you can see it". I will try for another day or two and then will write it down on my notes as "unusable due to bad logging/debugging options", not just because "it is USB" if that makes sense.

[–] justpassingby 0 points 6 months ago (6 children)

Thanks, I understand the point of view. So maybe let me rephrase it. ZFS is not telling me more info that the one I posted above (maybe this is all it sees like you said). Do you know of any other way to make ZFS more verbose on the issue or giving me more info? If not, it is ok but I have a second question: Where would you look on which is the culprit amongst "bad USB controller, firmware, cable, or driver" without trying-by-switching them out? Thank you for your advice.

16
submitted 6 months ago* (last edited 6 months ago) by justpassingby to c/[email protected]
 

Hi all,


UPDATE: I closed the post (the timebox I gave myself to understand the issue is now over). Thank you all for the help ^^


DISCLAIMER: The objective of this post is to understand how people would debug issues like these when real data is involved and get to the bottom of the problem. The objective is NOT to "restore service" but to understand what failed. The tone of the post is voluntarily not serious to keep it light.


I am playing a little with TrueNas Scale and ZFS. I was trying to use a second NVME disk via USB to do a replication once a day of the main pool, however I had issues with this secondary pool being SUSPENDED for "too many errors". This pool is not directly write/read by users/apps, but it is just there to be "replicated on" once a day.

Now, please, I know that using disks via USB is not advised. Also I am not interested in recovering the data, since there is nothing real on it. What I am doing is testing to see if the system is brittle, and if it is, how to debug if there is a real issue.

Now to the point. The pool is SUSPENDED. Good. Why? I mean, the real reason why. To see if the system can be used in real life it needs to be debuggable.

Let's start. The pool is SUSPENDED:

pool: tank-02
state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-JQ
config:

    	NAME                                	STATE 	READ WRITE CKSUM
    	tank-02                             	UNAVAIL  	0 	0 	0  insufficient replicas
      	xxx-xxx-xxx-xxx-xxx                FAULTED  	3 	0 	0  too many errors

errors: 4 data errors, use '-v' for a list

To which you may ask: why? Too many errors (the -v says nothing more). Well that doesn't help, does it. When you run zpool clear:

# zpool clear tank-02   	 
cannot clear errors for tank-02: I/O error

Incredibly useful as you can see. dmesg to the rescue?

WARNING: Pool 'tank-02' has encountered an uncorrectable I/O failure and has been suspended.

Thanks? I guess. I know it it trying to safeguard data but again... why?

Before you ask:

  • SMART checks are good
  • Yes, I restarted the device. As soon as you try to use/mount/import you get to the same issues.
  • Nothing else peculiar in dmesg. I mean the USB was usb 2-4: USB disconnect, device number 12 whatever the reason why. I mean, kick me if I know why TrueNas scale decided that having /sys/module/usbcore/parameters/autosuspend to 2 is a good idea but again, that is not the point. I need ZFS to reply to me what is the issue for its point of view.

I have read a lot online. Maybe it is the temperarure (usb enclosure heating up), maybe it is the cable, power, "it is the usb controller", or the chipset doing the usb -> nvme... However, therey are not saying what to check. People is guessing. I saw more tech behind reading tea leaves.

My question for you all is this: ZFS SUSPENDED one of my pools. It (seems to me) is refusing to fix it. Refusing to do anything with it and to tell me why. So, in a real world case, how to debug it? If I have to trust my data to it, I don’t want the only option to be “use many disks and just replace one and the cable when ZFS poo-poo”.

How to know the cause?

Thank you for the help.

PS: I am sure I am missing some very basic ZFS knoweldge on the topic, so please let me know what else can I do to make ZFS talk to me.

[–] justpassingby 2 points 1 year ago (1 children)

Hi, just writing it here in case you were curious of the cause of the problem.

I finally had some time today to work on it. What I tried was to copy the content of the system boot partition of another PI, one that I configured at the same time/day, and compared file by file with the content of the partition in the broken one. Now there were some diff in the some binary files as expected, but what surprised me was that one file was ONLY present in the working one... initrd.img XD Now, don't ask me why the hell the file was not there. Maybe it got corrupted somehow, since nothing touches this partition as far as I am aware aside at boot time. Luckily, there was .bak file present in the partition which I renamed and... it worked.

Lesson learned: I now have a copy of the boot partition of every PI I managed (it is only 167MB pre tar-zip, so it is not a cost) and I have it backed up safely on another system. Should another file get corrupted in the future (maybe this time without a .bak) I have an older working copy if it, and I can restore the service without the need to format everything.

Thank you so much for your help and advice!

[–] justpassingby 2 points 1 year ago

I could format it and try but the moment I do I lose the ability to debug the issue and learning from the problem. So I may temporary solve it but it may happen again. I wonder if someone knows what I can check/test/run to identify what broke.

[–] justpassingby 2 points 1 year ago (3 children)

Thanks for the quick reply 👍

The next step I would try would be to boot an other install, like a liveusb or a raspbian, on the same usb port, to completely eliminate a hardware problem if it boots properly.

Good advice. I moved the usb drive from another (working) PI and attached it to the same USB port. It boots correctly. It is not the USB port nor power.

If it is a software problem, it seems to happen very early in the boot process, so my bet would be a corrupted initramfs/initrd (or what is equivalent on a Pi). No idea how you could debug and fix that on Ubuntu, though (especially on a Pi where /boot is… different).

I believe it is something like that. Or it is not mounting the drive correctly and not finding it, or it is something else. I just wish there was a better (or any) error printed on the console. I tried to attach a keyboard to get to a shell with no success. I honestly could just reformat the drive and use a clean install, but it is the last resource. I would like to understand what happened so I can learn from it and avoid it in the future (or learn a path to fix it).

 

Hi all,

I am having a strange new issue with the one of the raspberry pi 4b I have running at home. One of them failed/restarted for some reasons and it is now stuck at boot with the line:

Waiting for root device LABEL=writable...

I am booting this PI from USB. From what I can see the disk is ok. I can mount it on my laptop and access it correctly. The partition is labelled correctly. I tried to move it to another PI I have and I have the same error (I did this to remove the possibility it was the PI/USB port). I am pretty sure it is not the power that is the issue (since I am giving it more than enough).

All of this was working correctly until now (for months). Ubuntu may have updated something (my fault, I may not have disabled the auto-update) or something else could have broken.

I can try to point to the partition via UUID instead of the label, but something tells me that is not the issue. Did anybody encounter such an issue in the past or has any advice on how to debug it?

Thank you for your help and time.

===

Solution: https://sh.itjust.works/comment/1646333

[–] justpassingby 1 points 1 year ago (1 children)

Sorry to hear. If you had NO connection with kubectl, I would have adviced you to check the ports; but if sometimes it replies and most of the time not, it must be something else. Good luck with the debug and if you have any specific problem you could also try to create a post on any of the self-hosted communities here on lemmy. From my experience people is more friendly and more technical than what we used to have on reddit.

[–] justpassingby 1 points 1 year ago (3 children)

Please do not use my bad experience stop you! Longhorn is a nice tool and as you can read online and in other posts it works very well. I may have unlucky, have a bad configuration or had my PIs under too much pressure. Who knows! My advice is try something new: k3s, longhorn, etc. That is what I use the PIs for. I would not use Longhorn at work :D

I’m really not sure what I even want the distributed FS for. I guess I wanted to have redundancy on the pod long term storage, but I have other ways to achieve that.

I am not using replicas :) I use longhorn for the clean/integrated backup mechanism instead of using something external. Maybe one day when I have the same-ish disk speed on all 3 PIs I will enable replicas but for now I am good like this.

For backups of important stuff maybe use something else or ALSO something else. I was personally thinking to use another backup too for longhorn devices like https://github.com/backube/volsync or velero to have a secondary source in case something happen. Also longhorn is always getting better. This is just out of the press https://github.com/longhorn/longhorn/releases/tag/v1.5.0

My advice? Try it out! If not, it will still be a source of learning and fun (but I am strange, I like to debug stuff).

[–] justpassingby 1 points 1 year ago* (last edited 1 year ago) (5 children)

Eh, I will have to find my notes on the issue with the pihole, I can see if I can dig them out this weekend and send it to you (I wonder if you can send PM in Lemmy ^^).

To stay on the point of this discussion: just, and I am not joking, this afternoon I got hit by this: https://longhorn.io/kb/troubleshooting-volume-with-multipath/ The pod (in this case wireguard) was crashing because it could not mount the drive and the error was something like "already mounted or mount point busy". I had to dig and dig but I found out the problem was the one above and I fixed it. I will now add that setting in my ansible and configure all three the PIs. However this should not happen for a mature-ish system like longhorn which may cater a userbase which may not know enough to dig into /dev . I think there should be a better way to alert the users for such an issue. Just to be clear, longorn UI and logs were nice and dandy, all good on the western front, but all was broken. Longorn reconciler could have a check that is something should be mounted, and is not, and the error is "already mounted", but is not "already mounted", check for known bugs. However I think the issue is what I said above. It is too fragmented and working with a miriad of other microservices, so longhorn is like "I gave the order, now whatever". I will share what is in my longhorn-system ns, there is no secret in here but I want to give an idea (ps: I do nothing fancy with longhorn at home - obvs some are ds so you see 3 pods because I have 3 nodes):

k get pods -n longhorn-system | cut -d' ' -f1
NAME
engine-image-ei-f9e7c473-5pdjx
engine-image-ei-f9e7c473-xq4hn
instance-manager-e-fa08a5ebf4663f1e9fb894f865362d65
engine-image-ei-f9e7c473-gdp6n
instance-manager-e-567b6ba176274fe20a001eec63ce3564
instance-manager-r-567b6ba176274fe20a001eec63ce3564
instance-manager-r-b1d285dd9205d1ba992836073c48db8a
instance-manager-e-b1d285dd9205d1ba992836073c48db8a
daily-keep-for-a-week-28144800-pppw8
longhorn-manager-xqwld
longhorn-ui-f574474c8-n847h
longhorn-manager-cgqvm
longhorn-driver-deployer-6c7bd5bd9b-8skh4
longhorn-manager-tjzvz
instance-manager-d3c9343a8637e4ef197ad6da68b3ed2d
instance-manager-cf746b18d51f6426b74d6c6652f01afc
engine-image-ei-d911131c-wwfwz
engine-image-ei-d911131c-qcn26
instance-manager-e7d92f3ca0455cde2158bebdbb33ea16
engine-image-ei-d911131c-mgb2k
csi-attacher-785fd6545b-bn9lp
csi-attacher-785fd6545b-4nfxz
csi-provisioner-8658f9bd9c-2bq7v
csi-provisioner-8658f9bd9c-q6ctq
csi-attacher-785fd6545b-rx7r9
csi-resizer-68c4c75bf5-tmw2f
csi-resizer-68c4c75bf5-n9dxm
csi-snapshotter-7c466dd68f-7r2x6
csi-snapshotter-7c466dd68f-cd8pm
longhorn-csi-plugin-vgqh5
longhorn-csi-plugin-mnskk
csi-provisioner-8658f9bd9c-kcb8f
csi-resizer-68c4c75bf5-gccfg
csi-snapshotter-7c466dd68f-wsltq
longhorn-csi-plugin-9q9kj

Dependency on the csi-* ecosystem sort of allows the errors to get lost in translation.

 

Tried it out in the past couple of days to manage k8s volumes and backups on s3 and it works surprisingly well out of the box. Context: k3s running on multiple raspberry pi

view more: next ›