this post was submitted on 17 Jun 2023
19 points (91.3% liked)

Lemmy

12579 readers
47 users here now

Everything about Lemmy; bugs, gripes, praises, and advocacy.

For discussion about the lemmy.ml instance, go to [email protected].

founded 4 years ago
MODERATORS
 

I like the idea with Lemmy/kbin and the fediverse but theres something I dont understand perhaps.

If in the future Lemmy is very popular and someone wants to add their own server and federate with everyone then from that moment that new instance will get all new comments, posts, etc. from all other instances its federated with and must save them in its db. This means if Lemmy gets popular forget about little guys helping out spread the “load” because every intance still must take and save all new data. Thats a lot of processing power and storage. How can this work? I see in the future only a few instances will survive.

If somehow each instance was a node and only took care of its posts and comments and forward them to others upon request I can understand scaling but this is not how it works AFAIK. Another way would be with consensus algorithms where a node saves more thsn its own data but still not all.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 21 points 1 year ago* (last edited 1 year ago) (9 children)

You've misunderstood. Every instance does not contain all content from every other instance. Only that which at least one user has specifically requested by entering the id of a community in the [email protected] format in search.

This means that the star trek instance, will only ever need to mostly host start trek content. It wont get flooded with everything else on the entire network, as it grows. Some portion of it, yes, as users on the star trek instance will inevitably sub to at least some stuff outside it, too.

Additionally, pictures and media are cached, but not permanently federated. When you upload a picture, you may have noticed the link becoming one that points to the instance you're posting from. This doesn't change even when that post gets federated to other instances, they are still fetching that image from the instance it was posted from (unless its a recent post, in which case the image may well be cached, as well).

This means that whats gets federated, is mostly just a bunch of text data, and even then, just a subset that is needed. A much lighter load.

At the smallest scale, you could have a node with just one user, perhaps that user creates a community or two. But this means that that instance will ONLY EVER store the subs of that one user, and the content of the communities they created. Not even close to the total content of the entire fediverse.

[–] HelloLemmySup 2 points 1 year ago (8 children)

Ok thats a bit better. I didnt know about that detail.

Still that only moves the problem to the future. As I understand you should pick a community at random to sign up and then from that community access the rest. Then its a matter of time that enough users from StarTrek that have signed up there subscribe to enough big communities for the problem to appear, no?

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (2 children)

There a a lot of ways to mitigate this. The total activity of a day, is negligable, which means you're presenting the inevitability of infinite data needing to be stored.

But that is the same issue any online service ever has had to deal with. And there are so many solutions. An instance admin might choose to delete inactive users or communities, or only choose to keep data for, say, 10 years.

You bring up the inevitability of there being enough users to eventually sub to everything. But that assumes infinite users. Any instance is only ever going to sub to a subset of the rest of the fediverse. Even if some instances grow so large that they sub to most of it, they wont need to host more than their text data. The entirety of wikipedia in text, fits on a thumb drive.

And you forget one more thing, the more users on an instance sub to the same thing, the more of the database can be shared, since they are storing the same subs with the same comments.

Yes, storage and resource usage increase as the usercount increses, but efficency goes up along with it. That single user instance would be using WAY more space per user than a multi-user instance would.

[–] HelloLemmySup 1 points 1 year ago (1 children)

I think its not so much storage as it is requests I am worried. If a small instance wants to join and a few users subscribe to a few big communities then it needs to potentially proccess a lot of updates from the pubsub. I would imagine these messages are optimized so you get many updates within the same message but still.

Can the small instance be federated only one way? Meaning big can see small and comment in small but small cant see big communities (only comments made from big to its own small communities?)

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

In the future, we'll likely gain similar migration tools to mastodon. This means we'll be able to "split" any instances that get too large to function, if such a thing ever happens.

If half of the users move away, but stayed subscribed to a given sub on that overloaded server, this would still reduce the load significantly, because any interaction now goes to the new server, once, when it is synced. And then that server handles pushing it out to all the users.

As for custom setups, I don't see why they wouldn't be possible. The server software would simply have to be made to work that way. AFAIK ActivityPub, the standard, doesn't have anything in it that would make federation all or nothing.

load more comments (5 replies)
load more comments (5 replies)