this post was submitted on 28 Nov 2023
3 points (80.0% liked)

Self-Hosted Main

511 readers
1 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

For Example

We welcome posts that include suggestions for good self-hosted alternatives to popular online services, how they are better, or how they give back control of your data. Also include hints and tips for less technical readers.

Useful Lists

founded 1 year ago
MODERATORS
 

I love Aria2, but I'm building a web scraper / crawler and I need to download hundreds of thousands of files. Aria2 locks up around the 20,000 file mark. Is there another download manager that could possibly be able to achieve what I'm trying to do? or a more recent fork of Aria2?

I have a workaround I believe, which is to use the API to determine how many files are in queue and sleep indefinitely until there is < 1000, but I'm not sure this is the most effective. It kind of significantly slows down the download pipe.

The issue seems to lie with connections timing out in aria2, which cause them to get locked up and they have to be manually cleared. I have my timeout set at 10 seconds, but that doesn't seem to matter. I've considered running a schedule task to clean them up, but was going to give downloading with Python a try first.

โ€‹

Any suggestions would be appreciated.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 1 points 11 months ago

Interesring. Source on the 20,000 file limit? It could just be that you need to increase the number of allowed file descriptors on your OS