this post was submitted on 11 Sep 2024
188 points (98.0% liked)

Fediverse

28724 readers
132 users here now

A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).

If you wanted to get help with moderating your own community then head over to [email protected]!

Rules

Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration), Search Lemmy

founded 2 years ago
MODERATORS
188
Map of 2000+ lemmy communities (danterious.codeberg.page)
submitted 3 months ago* (last edited 3 months ago) by [email protected] to c/[email protected]
 

This is my first try at creating a map of lemmy. I based it on the overlap of commentors that visited certain communities.

I only used communities that were on the top 35 active instances for the past month and limited the comments to go back to a maximum of August 1 2024 (sometimes shorter if I got an invalid response.)

I scaled it so it was based on percentage of comments made by a commentor in that community.

Here is the code for the crawler and data that was used to make the map:

https://codeberg.org/danterious/Lemmy_map

you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 6 points 3 months ago* (last edited 3 months ago) (1 children)

Very cool!

Do you be have any idea how tolling scraping these data is for the servers?

If this is something you want to keep working on, maybe it could be combined with a sort of Threadiverse fund raiser: we collectively gather funds to cover the cost of scraping (plus some for supporting the threadiverse, ideally), and once we reach the target you release the map based on the newest data and money is distributed proportionally to the different instances.

Maybe it's a stupid idea, or maybe it would add too much pressure into the equation. But I think it could be fun! :)

[โ€“] [email protected] 2 points 3 months ago

I had to try scraping the websites multiple times because of stupid bugs I put in the code beforehand, so I might of put more strain on the instances than I meant too. If I did this again it would hopefully be much less tolling on the servers.

As for the cost of scraping it actually isn't that hard I just had it running in the background most of the time.

~Anti~ ~Commercial-AI~ ~license~ ~(CC~ ~BY-NC-SA~ ~4.0)~