this post was submitted on 27 Mar 2025
1 points (100.0% liked)

It's A Digital Disease!

20 readers
2 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
 
The original post: /r/datahoarder by /u/awolfwearingabanana on 2025-03-27 03:14:40.

The title says it all, I was originally trying to use wget to download this specific collection https://catalog.archives.gov/search-within/530707, but it just wont download. I want to archive this because I don't only find it cool and I want to keep a copy of it on my drive, but I also want to do my part to combat the purges. I would also know how to filter the download to only download the images and documents, and none of the site assets? Such as only downloading the .tiff, .jpg/jpeg, png, and pdf files in the catalog.

Wget command I was running: wget --mirror --page-requisites --convert-link --no-clobbe robots=off --no-parent --user-agent=Mozilla --random-wait --recursive --domains archives.gov https://catalog.archives.gov/search-within/530707

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here