this post was submitted on 13 Aug 2023
191 points (100.0% liked)
196
16591 readers
1987 users here now
Be sure to follow the rule before you head out.
Rule: You must post before you leave.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Plz give me an hour long lecture on filesystems
I can give you the summary:
NTFS: what the fuck is this still doing in the wild. Malware maybe?
FAT: legacy FS for the lowest common denominator. Not good in any objective sense, but sometimes you are forced to use it.
ReiserFS: No one knows what ReiserFS actually does all we know is that the developer killed his wife and went to prison
Ext4: Old, stable, solid, and fast (even faster if you turn off journaling at the expense of reliability), but it's missing some really nice modern features, like Copy-on-Write, snapshotting, and transparent compression
XFS: Ext4 for hipsters. Old, stable, fast, etc. There's a recent push to try to make this a cool FS again with e.g. Stratis and reflinking support, but I don't think it's going to be the future. Still a great FS, but if you want XFS you'll already know.
F2FS: Designed for optimal performance on flash storage, but generally it's not good enough at that job to warrant using it over something else that has more features or is more reliable. Supports transparent compression, but oddly enough it doesn't actually give you any extra free space when you compress. The compression is done only to reduce wear and tear on the cells.
ZFS: Gold standard FS, but incompatible licensing means it can't be in the Linux kernel by default. It's very stable, very fast, and likes to eat your RAM if you don't tell it not to. Transparent ZSTD compression can be enabled to compress and decompress your files as you write/read, saving a lot of space. Its Copy-on-Write nature is more stable than traditional in-place editing because it won't be vulnerable to power loss. As of OpenZFS 2.2 it also includes the reflinking feature, which will make file copies be instant within the same filesystem, reducing wear and tear. Notably if you're making home-use RAID5/6 arrays it can feel very inflexible because it won't let you add HDDs to your array on the fly (this was fixed on November 10th, 2023, but will not be released until OpenZFS 2.3 comes out). You'll need to put some extra money into your home NAS setup to begin to see this filesystem shine, e.g. sacrificing half your disks for fast mirrored vdevs, which also allow reasonable growth. Buying a ton of RAM and maybe even an L2ARC SSD cache will also prevent wear and tear on your HDDs because most of your file reads will come from caches instead of the slow disks. This FS is designed for large data centers, and it's really good at that.
BTRFS: ZFS for the rest of us. Contains a similar featureset to ZFS (Copy-on-Write, snapshots, transparent compression), but it's better integrated into the Linux ecosystem. It's generally worse in every other respect to ZFS, and it's not as stable or time-tested either. It famously contains a "write-hole" bug that means RAID5/6 have the potential to eat your data, and so those are not recommended to be used. If you're only considering a filesystem for a single drive (i.e. not a RAID), most of BTRFS's downsides don't matter. Many Linux distros are shipping with this FS as the default for the OS drive now, and it's more or less being championed as "the future".
BcacheFS: "The future v2". Not even stable yet, that's how you know it's good. Contains unlimited features and zero bugs, 100% reliability, etc. All the best software is theoretical. We'll see what happens when it's fully mainlined into the Linux kernel.
MergerFS: fuck it what if filesystems weren't even real. Transparently combine any other filesystems into a single logical filesystem. Usually used together with SnapRAID to make a more flexible home NAS RAID5/6 that can add or remove disks on the fly - this isn't possible under ZFS and BTRFS shouldn't be using RAID5/6 at all. Notably this configuration does not stripe data across disks, so even with an unmitigated disk failure you won't lose the whole array. Disk performance won't be as good as ZFS but flexibility and storage efficiency are probably a lot more important to a casual NAS owner.
So basically TLDR: BTRFS for home/OS use, MergerFS+SnapRAID backed by BTRFS disks for a budget home NAS, ZFS for a more expensive home NAS/server, and ZFS for enterprise.
I have not tried to use ZFS for home/OS but I'm guessing it's not a good idea - most of ZFS's advantages over BTRFS are regarding RAIDs, and poor Linux integration may cause headaches. Ext4/XFS is good for home/OS use if you have a real usecase, but otherwise BTRFS is generally just a better drop-in for most people these days. On HDDs you're probably going to want to use ZFS or BTRFS because the transparent compression and copy-on-write nature will make the transfer speed not matter as much. On super-fast SSDs (e.g. for home/OS or a games drive), BTRFS might end up being slightly slower than Ext4/XFS, but transparent compression and copy-on-write will also prevent a ton of wear and tear on the physical SSD, so it really depends on your usecase.
Is the buggyness of BTRFS already gotten better? I fear to actually use it on my server as I've heard bad things about its stability.
If BTRFS is messing disks up in the wild today, I haven't heard about it. You'll find a few people saying that BTRFS messed up their disk in the past and they won't touch it anymore, but Facebook runs all their stuff off BTRFS, and OpenSUSE+Fedora are shipping BTRFS by default to their users as well. We'd be hearing about it if there are real problems. Unfortunately, it's easier to just say "BTRFS isn't 100% stable because in the past it had bugs" and never touch it again, and it's very hard to 'disprove' that claim because the people who have had problems are very few and far between - maybe you just haven't hit their edge case yet. BTRFS has been around for a long time now, and at this point it's probably going to be easier to just champion BcacheFS than try to remove the stigma around BTRFS.
If your server is a personal server I would feel free to use BTRFS if you feel the rest of its featureset fits your usecase. If you're dealing with other people's data/enterprise then I would try to use ZFS or something else, "just in case".
Looks like I'm gonna give btrfs another try.
I'm building an unRAID server and have my main array as 3 14TB HDDs and my cache pool is a 2TB NVMe, all 4 are set to ZFS. Are you saying I should swap my NVMe to BTRFS to reduce wear and tear?
ZFS will do the same thing as BTRFS in regards to wear and tear, so swapping to BTRFS is almost certainly not the answer in your case. Since you're using it as a cache pool I'm not quite sure what the performance implications of these features are, but the two main things related to wear and tear are:
Transparent compression: This basically means that when you write a file to the drive, ZFS will first compress it using your CPU, then write the compressed file to the drive instead. This file will be smaller than your original file, so it won't take as much space on the drive. Because it doesn't take as much space on the disk, you limit the number of bits being written and subsequently limit the wear and tear. When you read the file again, it decompresses it back to its normal state, which also limits the number of bits being read from the drive. The standard compression algorithm for this task is usually ZSTD, which is really good at compressing while still being very fast. One of the most interesting parts about ZSTD is that no matter how much effort you spend compressing, decompression speed will always be the same. I tend to refer to this benchmark as an approximation for ZSTD effort levels with regards to transfer speeds - the benchmark is done with BTRFS but the results will translate when enabling this feature on ZFS. Personally I usually run ZSTD level 1 as a default, though level 2 is also enticing. If you're not picking 1 or 2, you probably have a strong usecase to be going somewhere closer to ~7. This is probably not enabled by default, so just check if it's enabled for you and enable it if you want. Keep in mind that it will decrease transfer speed if the drive itself is not the bottleneck - HDDs love this feature because compressing/decompressing with your CPU is much faster than reading/writing to spinning rust. Also keep in mind that ZSTD is not magic, and it's not going to do much to already compressed files - photos, music, video, archives, etc.
Copy on write: Among other things, this mostly means that when you copy a file to the same drive, it will be instant - logical file markers will just be created to point at the previous physical file blocks. We don't need to read the full file, and we don't need to re-write the full file. In your usecase of cache pool I don't think this will affect much, since any duplicate hits are probably just getting redirected to the previous copy anyway? This should be the default behavior of ZFS tmk, so I don't think you need to check that it's enabled or anything.
i like level 3 because you enable it by putting :3 in the mount options :3
Apologies, this is the only correct answer from now on. Thank you for helping me to see the light.
Why not lz4?
LZ4 is faster but ZSTD compresses better. They're both viable on ZFS, but it depends on your usecase. Notably BTRFS does not offer LZ4, and in the case of BTRFS I would really only recommend ZSTD, as LZO is just not worth it.
Edit: Found some benchmarks you can look through: https://docs.google.com/spreadsheets/d/1TvCAIDzFsjuLuea7124q-1UtMd0C9amTgnXm2yPtiUQ/edit#gid=0
Amazing. Thank you. It's awesome that zstd has caught up in encode/decode time. There was a solid 5-10 years there nothing came close to lz4 in both.