this post was submitted on 10 Nov 2023
3 points (100.0% liked)

Data Hoarder

170 readers
1 users here now

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

founded 1 year ago
MODERATORS
 
top 28 comments
sorted by: hot top controversial new old
[–] [email protected] 2 points 1 year ago (1 children)

There seem to be a few of these in Github, found this one recently it seems to do a reasonable job. The real problem is its bash script you need WSL2 to run it on windows.

What I like is the output is CSV files that are spreadsheet friendly and can be used to analyze and remove files in bulk.

https://github.com/Jim-JMCD/Duplicate-File-Finder

Czkaaka (CLI version) output file can be used to delete stuff in bulk but doesn't list directories separately you have to through files individually.

[–] [email protected] 2 points 1 year ago (1 children)

The real problem is its bash script you need WSL2 to run it on windows.

Eh?

WSL2 is one way to run a Linux kernel (and thus native Linux executable binaries) in Windows.

And while bash is definitely very common on Linux, it has never by any means been a strictly Linux program.

It can be used on all kinds of operating systems -- mostly unix-like operating systems, but also including Windows using a POSIX compatibility shim like Cygwin.

People were using bash in 1989, years before Linux became the beginning of a thing. And folks have been using it on Windows since at least 1995, or maybe even earlier -- decades before WSL.

[–] [email protected] 1 points 1 year ago

WSL2 is still the easiest way to get bash imo.

[–] [email protected] 1 points 1 year ago (1 children)

This utility helps you find duplicate files, empty folders, images, videos, and music that are similar to one another, and much more. And it's free. Here's a link to the repository.

[–] [email protected] 1 points 1 year ago

I used to use a program called NoClone, but it appears that somebody may have purchased it a while back and stopped development and made it kind of shady.

My old licensed version doesn't work very well, so I was looking for a replacement and as I have learned more and more about this program, I've been able to use it more and more effectively.

Edit: I should mention I'm a visual person, so I do use the GUI.

[–] [email protected] 1 points 1 year ago

I was just thinking about this tool the other day while I was driving. "duplicate file finder with polish name"

[–] [email protected] 1 points 1 year ago

Any suggestions for a tool like this i could run natively on my QNAP NAS?

[–] [email protected] 1 points 1 year ago

For basic duplicates, I find rmlint to be a superior alternative.

[–] [email protected] 1 points 1 year ago (2 children)

I use VisiPics. It's very old, but the algorithm is very good.

[–] [email protected] 1 points 1 year ago

Yep, it's ancient but the algorithm is still good in 2023.

I thought it was open source but it seems it's just freeware? I was hoping someone would build a new front-end...unfortunately I lack the skills to do so.

[–] [email protected] 1 points 1 year ago

Just tried this! It's either so efficient my CPU isn't working too hard or it focuses most of its work onto one CPU core. That said, I appreciate being able to identify the highest res version of a file, that helps a lot!

[–] [email protected] 1 points 1 year ago (1 children)
[–] [email protected] 1 points 1 year ago (2 children)

The latest WiFi standards provide faster theoretical throughput than what most motherboards have onboard for ethernet.

[–] [email protected] 1 points 1 year ago (1 children)

Lol.

This was /r/apple's excuse for why iPhones arill have USB 2.

[–] [email protected] 1 points 1 year ago

Completely different scenario.

Most motherboards only come with a single 1Gbps port, maybe 2.5Gbps if you're lucky. The latest WiFi standards far exceed 1Gbps, so I'm not sure why people are up in arms about my comment when it's factual.

[–] [email protected] 1 points 1 year ago

lol. lmao, even

[–] [email protected] 1 points 1 year ago

i am looking for a tool to find similar videos i had downloaded before .. its big like 10 tb in 2 two drives...

[–] [email protected] 1 points 1 year ago

Always loved Picasa unfortunately stopped development. And it crashed if you compare over 100k images

[–] [email protected] 1 points 1 year ago

I love it, but can never remember how to pronounce it so I can't recommend it to people very easily.

[–] [email protected] 1 points 1 year ago (2 children)

I need similar tool but for music. Name of the files might be different. Size might be slightly different as well but im not sure that its count as duplicates

[–] [email protected] 1 points 1 year ago

Czkawka has something for finding duplicate music, though I've never used it.

[–] [email protected] 1 points 1 year ago

This tool works for audio, too! I haven't tried it for audio but the options are there.

[–] [email protected] 1 points 1 year ago

I've removed at least 100GB of duplicates using this tool. Running every few months is a good way to cleanup things when my download queues get unruly.

[–] [email protected] 1 points 1 year ago

Czkawka, dupeGuru, and VisiPics are my go-to for non-exact photo duplicates. I typically run all three, since they don't all find the same duplicates. VisiPics runs fine under wine.

But Czkawka's real strength is non-exact video duplicates. The only other tool I've found that does that is videoduplicatefinder.

[–] [email protected] 1 points 1 year ago (1 children)

Revolutionized the way I organize my files. There is no better tool.

[–] [email protected] 1 points 1 year ago

Anti-Twin (in pixel recognition mode) takes a long time, but finds more dupes.