this post was submitted on 07 Oct 2024
565 points (98.8% liked)

Technology

60062 readers
3367 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] purrtastic@lemmy.nz 48 points 2 months ago (1 children)

It’s not fine. They are not archiving the internet.

I had to ban their user agent after very aggressive scraping that would have taken down our servers. Fuck this shitty behaviour.

[–] Melvin_Ferd@lemmy.world 4 points 2 months ago (1 children)

Isn't there a way to limit requests so that traffic isn't bringing down your servers

[–] Mojave@lemmy.world 14 points 2 months ago (2 children)

They obfuscate their traffic by randomizing user agents, so it's either add a global rate limit, or let them ass fuck you

[–] WhyJiffie 1 points 2 months ago (1 children)

the article told all source IPs can be tracked back to bytedance. Wouldn't it be possible to block them? maybe even blocking all IPs of a specific ASN

[–] tempest@lemmy.ca 2 points 2 months ago* (last edited 2 months ago)

They can be tracked back one by one but if you have any amount of traffic it's a constant game of cat and mouse.

You can block entire ASNs until they start using residential proxies provided by less ethical companies. Then you end up blocking all of France or destroying user experience by enforcing a captcha on everyone.

[–] Melvin_Ferd@lemmy.world 1 points 2 months ago

Why do they need to hit a website like that? Wouldn't it just need to scrape the data and frig off. What is the point of creating that much traffic