this post was submitted on 06 Oct 2024
138 points (97.3% liked)

Fediverse memes

202 readers
89 users here now

Memes about the Fediverse

founded 3 weeks ago
MODERATORS
138
They deserve it (i.imgflip.com)
submitted 1 week ago* (last edited 1 week ago) by [email protected] to c/[email protected]
 

For the context: https://feddit.org/post/2374543

TL;DR: Aussie.zone has been experiencing a 7-days delay with LW since quite a long time due to the way Lemmy manages federation, and the distance between LW and aussie.zone.

Lemmy.world is the only one having such delay, even Lemm.ee which is the second most active instance is up-to-date (same for Lemmy.ml with a lot of active communities)

Graphs:

The issue has been solved in 0.19.6 (https://github.com/LemmyNet/lemmy/pull/4623) which hasn't been released yet.

Explanation by @[email protected]

lemmy’s current federation implementation works with a sending queue, so it stores a list of activities to be sent in its database. there is a worker running for each linked instance checking if an activity should be sent to that instance, and if it should, then send it. due to how this is currently implemented, this is always only sending a single activity at a time, waiting for this activity to be successfully sent (or rejected), then sending the next one.

an activity is any federation message when an instance informs another instance about something happening. this includes posts, comments, votes, reports, private messages, moderation actions, and a few others.

let’s assume an activity is generated on lemmy.world every second. now every second this worker will send this activity from helsinki to sydney and wait for the response, then wait for the next activity to be available. to simplify things, i’ll skip processing time in this example and just work with raw latency, based on the number you provided. now lemmy.world has to send an activity to sydney. this takes approximately 160ms. aussie.zone immediately responds, which takes 160ms for the response to get back to helsinki. in sum this means the entire process took 320ms. as long as only one activity is generated per second, this is easy to keep up with. still assuming there is no other time needed for any processing, this means about 3.125 activities can be transmitted from lemmy.world to aussie.zone on average.

the real activity generation rate on lemmy.world is quite a bit higher than 3.125 activities per second, and in reality there are also other things that take up some time during this process. over the last 7 days, lemmy.world had an average activity generation rate of about 5.45 activities per second. it is important to note here that not all activities generated on an instance will be sent to all other linked instance, so this isn’t a reliable number of how many activities are actually supposed to be sent to aussie.zone every second, rather an upper limit. for example, for content in a community, lemmy will only send these activities to other instances that have at least one subscriber on the remote instance. although only a fraction of the activities, private messages are another example of an activity that is only sent to a single linked instance.

to answer the original question: the week of delay is simply built up over time, as the amount of lag just keeps growing.

top 27 comments
sorted by: hot top controversial new old
[–] [email protected] 2 points 3 days ago (2 children)

Oh hey! That's us!
We've known about this issue since March. OP's post is a very good explanation of the problem. Lemmy wasn't designed with one huge instance in mind like this and lemmy.world is the only instance I'm aware of being a problem. Our Kiwi sisters and brothers hacked around the problem five months ago: They set up a server in Finland, slurp up bulk lemmy.world content in batches and then insert that content into their local database. They invited us to share their code to do this, but:

  1. It required us to spend money on a VM in Finland specifically for lemmy.world content - a precedent that made us hesitate.
  2. This is a temporary problem, a new version of Lemmy was meant to come out some months to remedy it. It simply hasn't been released, yet.

Had we known in May that this would still be an issue in October, we might have chosen differently.

[–] [email protected] 2 points 3 days ago

Trying to fetch the missing comments from this alt, for Aussie.zone people reading this, the full thread can be found on https://feddit.org/post/3524876

[–] [email protected] 1 points 3 days ago

Hello, thank you for jumping in!

[–] AwesomeLowlander 12 points 1 week ago

That seems like a huge oopsie in the design...

[–] [email protected] 10 points 1 week ago
[–] [email protected] 9 points 1 week ago (1 children)

Slowly drifting further away with each moment… as we like it

[–] [email protected] 4 points 1 week ago

We try to keep you guys with us!

[–] [email protected] 8 points 1 week ago (1 children)

Well there's another reason for not using l.w - it's now so big, you need to host your instance in Europe to reduce the lag.

[–] [email protected] 6 points 1 week ago (1 children)

Until v0.19.6 (the git link says "federation: parallel sending per instance #4623"), although I'm surprised that they haven't switched to a package model yet - that must be absolute hell on someone's network bandwidth to operate a Lemmy instance. Imagine an accidental downvote, then press it again to cancel, then press upvote, altogether counts as 3 separate network traffic packages. In comparison, I would naively guess that e.g. sending a whole minute's worth of activity would be much more preferable, or at least bundle all the activities together that occurred within the prior second to send at once?

[–] [email protected] 4 points 1 week ago (1 children)

And that's with everything operating as expected, when things go sideways it can really hammer an instance's performance.

It's why having funded and dedicated developers is critical as they can respond to issues like this and fix it while we are still in the early days.

[–] [email protected] 4 points 1 week ago (1 children)

It's a good thing that nothing of consequence is planned to happen anywhere in the world, especially the USA, like related to politics or anything. Y-y-yeah it's a good thing there's nothing like that coming! 😔

[–] [email protected] 4 points 1 week ago (1 children)

I'm glad we have spare capacity here as you never know what bugs, runaway processes, surprise enshittification or real-world mayhem could be around the corner.

[–] [email protected] 7 points 1 week ago

FYI @[email protected] as you're the person I usually discuss this with

[–] [email protected] 6 points 1 week ago (3 children)
[–] [email protected] 11 points 1 week ago

They have to ship a thumb drive with all federated posts over there every day.

[–] [email protected] 7 points 1 week ago

It's only 7 days because, apparently, it just deletes things older than that.

[–] [email protected] 5 points 1 week ago (1 children)

There's the TCP sliding window to solve this. Just needs done, along with a whole host of other work. I'd donate, but are the original pair any better at sharing work yet?

[–] [email protected] 8 points 1 week ago

The parallel sending has been fixed in 0.19.6: https://github.com/LemmyNet/lemmy/pull/4623

The issue might still be around until

  • 0.19.6 is released
  • LW updates to it (they tend to be slower to update due to their size)
[–] [email protected] 1 points 1 week ago* (last edited 1 week ago) (2 children)

Might be worth considering batching activities instead of sending one by one and accumulating delay.

Anti Commercial-AI license

[–] [email protected] 3 points 1 week ago (1 children)

Batching is not apub standard, so they would be not compatible with non lemmy services. The new approach should resolve it

[–] [email protected] 3 points 1 week ago* (last edited 1 week ago) (1 children)

It might be worth proposing as an addition. I think we all know that batching is better for the network and processing. Parallelization might work now, but the network grows (and it probably will), large servers might have more than ~6 activities/second to send.

Anti Commercial-AI license

[–] [email protected] 4 points 1 week ago

I think if they make the amount of activities you can send adjustable (automatically or manually) it can also scale. The problem is that it will also cause a lot more load, but I'm not quite certain by how much. But I will agree that apub needs to extend itself to allow batching federation activities as it's bound to become an issue eventually.

[–] [email protected] 2 points 1 week ago (1 children)

The issue has been solved in 0.19.6 (https://github.com/LemmyNet/lemmy/pull/4623) which hasn’t been released yet.

[–] [email protected] 1 points 1 week ago (1 children)

Yes, that's parallel sending with parallel processes, not batching and sending a single request.

Anti Commercial-AI license

[–] [email protected] 2 points 1 week ago (1 children)

Feel free to suggest this in a PR

[–] [email protected] 1 points 1 week ago

I don't have github and am working on other open source stuff at the moment.

Anti Commercial-AI license