this post was submitted on 09 Nov 2023
2 points (100.0% liked)

Data Hoarder

24 readers
1 users here now

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

founded 10 months ago
MODERATORS
 

I am archiving a vast amount of media files that are rarely accessed. I'm writing large sequential files, at peaks of about 100MB/s.

I want to maximise storage space primarily; I have 20x 18TB HDDs.

I've been told that large (e.g. 20 disk) vdevs are bad because resilvers will take a very long time, which creates higher risk of pool failure. How bad of an idea is this?

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 10 months ago (1 children)

I always understood as a balancing act in your vdev sizing. Too big = long rebuild times with so many disks spinning up every time. too small = wasted $ with TB loss to redundancy (raidz2 with 3 20tb disks only utilizes 33% of the TB purchased).

I've always felt you should calculate how many disks do you need to saturate your connection and go from there. 10gig trunk on your network then you'll want 1 vdev to be able to saturate a 10gig line. any larger than that you don't get any benefit from.

[–] [email protected] 1 points 10 months ago

Hold on a sec...doesn't more vdevs translate to more IOPS and speed?

A few years back I was on this sub asking about the best way to carve my 16x2tb zfs pool, and was advised that multiple vdevs would be better for performance.

I wound up going for 2 vdevs of 8x2tb raidz2, as it seemed to be the right redundancy to performance ratio for me.

I do have to say though, while the pool itself has been rock solid, the performance is pretty bad...I had to turn off sync completely because it was unusable even just browsing files on the share. Even with sync turned off the performance is pretty bad still, and this is on a system with 16 cores and 64gb ram. I'd love to get some ideas on what a better performing config might be...I'm even open to ditching zfs altogether and going with reverting to mdadm if zfs is not there yet in terms of performance.