this post was submitted on 28 Nov 2023
3 points (80.0% liked)

Self-Hosted Main

502 readers
1 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

For Example

We welcome posts that include suggestions for good self-hosted alternatives to popular online services, how they are better, or how they give back control of your data. Also include hints and tips for less technical readers.

Useful Lists

founded 1 year ago
MODERATORS
 

I love Aria2, but I'm building a web scraper / crawler and I need to download hundreds of thousands of files. Aria2 locks up around the 20,000 file mark. Is there another download manager that could possibly be able to achieve what I'm trying to do? or a more recent fork of Aria2?

I have a workaround I believe, which is to use the API to determine how many files are in queue and sleep indefinitely until there is < 1000, but I'm not sure this is the most effective. It kind of significantly slows down the download pipe.

The issue seems to lie with connections timing out in aria2, which cause them to get locked up and they have to be manually cleared. I have my timeout set at 10 seconds, but that doesn't seem to matter. I've considered running a schedule task to clean them up, but was going to give downloading with Python a try first.

โ€‹

Any suggestions would be appreciated.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 2 points 9 months ago

Pretty sure thats not a limitation of aria2. What system are you using? There a multiple options:

  1. Just build a queue
  2. Check if aria2 locks up because of system limitations. Max open files, max concurrent connections (there will be a practical limit no matter what you do)
  3. build a queue with a cluster to overcome such limitations