this post was submitted on 13 Nov 2023
2 points (100.0% liked)

Data Hoarder

24 readers
1 users here now

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

founded 10 months ago
MODERATORS
 

I was recently trying to download Lockpick_RCM when I found out that Nintendo filed a DMCA notice against them and the repo was taken down. This got me thinking about how I might be able to build a system that automatically and periodically clones/pulls repos. On the surface, this seems like something that would be super straight forward with a crappy script and a systemd service, but I have one thought that adds complexity. In theory, it would be possible for the maintainers of the repo to rebase and squash all commits back to some point in time. I would like to make sure that I don't lose commits that have been squashed on the main branch, while also not losing the ability to continue updating.

As I'm writing this, I realized that one solution could be to have some logic in the script that catches when a pull fails due to a history conflict, move the current working directory off to a timestamped directory, and then clone a new copy.

I have thought about the workflow of hosting Gitea and setting up mirrors there. It sounds like a good idea, but squashed repo's, etc would get clobbered.

Have any of set up anything to create historical (vs functional) archives of Git repo's you don't want to lose?

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 10 months ago

Gitea and repo cloning is what you’re looking for. Default pulls every 8 hours from repos and updates to what you have.

Can also set it up to push to another repo outside like bitbucket for extra redundancy.