this post was submitted on 25 Nov 2023
1 points (100.0% liked)

Data Hoarder

24 readers
1 users here now

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

founded 10 months ago
MODERATORS
 

Hello people from DataHoarder,

I just found this community while looking for the solution to my problem with HTTRACK, and I've noticed many people before me have had this same problem.

I am trying to download the full contents of a particular site to add to my archives all at once (it would take ages to download all of the individual webpages as PDF files manually) and find it impossible to do it with HTTRACK because it keeps displaying the Mirror Error.

I've tried everything I could already, scoured all the forums and read all of the forum posts on this issue, and nothing that has been suggested has worked so far.

Asking for help here is truly my last resort.

This is the site I'm trying to download the full contents of: https://www.gornahoor.net/

I'll be undescribably thankfull if anyone could solve this for me.

Thank you in advace.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 9 months ago (1 children)

It shows this:

MIRROR ERROR! HTTrack has detected that the current mirror is empty. If it was an update, the previous mirror has been restored. Reason: the first page(s) either culd not be found, or a problem occured. => Ensure that the website still existsand/or check your proxy settings! <=

I tried using other tools but haven't had any luck. Cyotek's Webcopy doesn't go deep enough into the site's layers. Or at least I couldn't make it do it.

[–] [email protected] 1 points 9 months ago (1 children)

The solution is here. The site maybe wrongly redirecting httrack because of its default user agent so change httrack's browser id/identity. Did you check the webcopy log for errors, it may be site blocking the app and if that is the case, you will also get the same issue with httrack.

[–] [email protected] 1 points 9 months ago

I already read this whole forum thread and nothing in there worked. If you can, please try to download the site and tell me if it works and exactly what you did, because I tried everything I read on that thread and nothing worked. As I said above, I read many threads and many forums and tried everything I found.

I just tried Teleport VLX but the free version is too limited to download the whole site (plus it dumps every file on the same folder, making it extremely disorganised).