this post was submitted on 10 Nov 2023
3 points (100.0% liked)

Data Hoarder

24 readers
1 users here now

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

founded 10 months ago
MODERATORS
 
top 28 comments
sorted by: hot top controversial new old
[–] [email protected] 2 points 10 months ago (1 children)

There seem to be a few of these in Github, found this one recently it seems to do a reasonable job. The real problem is its bash script you need WSL2 to run it on windows.

What I like is the output is CSV files that are spreadsheet friendly and can be used to analyze and remove files in bulk.

https://github.com/Jim-JMCD/Duplicate-File-Finder

Czkaaka (CLI version) output file can be used to delete stuff in bulk but doesn't list directories separately you have to through files individually.

[–] [email protected] 2 points 10 months ago (1 children)

The real problem is its bash script you need WSL2 to run it on windows.

Eh?

WSL2 is one way to run a Linux kernel (and thus native Linux executable binaries) in Windows.

And while bash is definitely very common on Linux, it has never by any means been a strictly Linux program.

It can be used on all kinds of operating systems -- mostly unix-like operating systems, but also including Windows using a POSIX compatibility shim like Cygwin.

People were using bash in 1989, years before Linux became the beginning of a thing. And folks have been using it on Windows since at least 1995, or maybe even earlier -- decades before WSL.

[–] [email protected] 1 points 10 months ago

WSL2 is still the easiest way to get bash imo.

[–] [email protected] 1 points 10 months ago (1 children)

This utility helps you find duplicate files, empty folders, images, videos, and music that are similar to one another, and much more. And it's free. Here's a link to the repository.

[–] [email protected] 1 points 10 months ago

I used to use a program called NoClone, but it appears that somebody may have purchased it a while back and stopped development and made it kind of shady.

My old licensed version doesn't work very well, so I was looking for a replacement and as I have learned more and more about this program, I've been able to use it more and more effectively.

Edit: I should mention I'm a visual person, so I do use the GUI.

[–] [email protected] 1 points 10 months ago

I was just thinking about this tool the other day while I was driving. "duplicate file finder with polish name"

[–] [email protected] 1 points 10 months ago

Any suggestions for a tool like this i could run natively on my QNAP NAS?

[–] [email protected] 1 points 10 months ago

For basic duplicates, I find rmlint to be a superior alternative.

[–] [email protected] 1 points 10 months ago (2 children)

I use VisiPics. It's very old, but the algorithm is very good.

[–] [email protected] 1 points 10 months ago

Yep, it's ancient but the algorithm is still good in 2023.

I thought it was open source but it seems it's just freeware? I was hoping someone would build a new front-end...unfortunately I lack the skills to do so.

[–] [email protected] 1 points 10 months ago

Just tried this! It's either so efficient my CPU isn't working too hard or it focuses most of its work onto one CPU core. That said, I appreciate being able to identify the highest res version of a file, that helps a lot!

[–] [email protected] 1 points 10 months ago (1 children)
[–] [email protected] 1 points 10 months ago (2 children)

The latest WiFi standards provide faster theoretical throughput than what most motherboards have onboard for ethernet.

[–] [email protected] 1 points 10 months ago (1 children)

Lol.

This was /r/apple's excuse for why iPhones arill have USB 2.

[–] [email protected] 1 points 10 months ago

Completely different scenario.

Most motherboards only come with a single 1Gbps port, maybe 2.5Gbps if you're lucky. The latest WiFi standards far exceed 1Gbps, so I'm not sure why people are up in arms about my comment when it's factual.

[–] [email protected] 1 points 10 months ago

lol. lmao, even

[–] [email protected] 1 points 10 months ago

i am looking for a tool to find similar videos i had downloaded before .. its big like 10 tb in 2 two drives...

[–] [email protected] 1 points 10 months ago

Always loved Picasa unfortunately stopped development. And it crashed if you compare over 100k images

[–] [email protected] 1 points 10 months ago

I love it, but can never remember how to pronounce it so I can't recommend it to people very easily.

[–] [email protected] 1 points 10 months ago (2 children)

I need similar tool but for music. Name of the files might be different. Size might be slightly different as well but im not sure that its count as duplicates

[–] [email protected] 1 points 10 months ago

Czkawka has something for finding duplicate music, though I've never used it.

[–] [email protected] 1 points 10 months ago

This tool works for audio, too! I haven't tried it for audio but the options are there.

[–] [email protected] 1 points 10 months ago

I've removed at least 100GB of duplicates using this tool. Running every few months is a good way to cleanup things when my download queues get unruly.

[–] [email protected] 1 points 10 months ago

Czkawka, dupeGuru, and VisiPics are my go-to for non-exact photo duplicates. I typically run all three, since they don't all find the same duplicates. VisiPics runs fine under wine.

But Czkawka's real strength is non-exact video duplicates. The only other tool I've found that does that is videoduplicatefinder.

[–] [email protected] 1 points 10 months ago (1 children)

Revolutionized the way I organize my files. There is no better tool.

[–] [email protected] 1 points 10 months ago

Anti-Twin (in pixel recognition mode) takes a long time, but finds more dupes.