If you’ve noticed content from Lemmy World lagging by a few hours the past few days, it’s not just you.
Long story short, it’s a problem with how Lemmy sends activities, and it’s heavily impacted by latency between sending and receiving server and creates a fixed upper limit on the number of activities per second that can be sent. Lemmy World is hosted in Finland, and DubVee on the US east coast. There’s only so much I can do to work around the inherent latency of a trans-Atlantic link.
We’re not alone in this. Some instances, particularly in Australia, New Zealand, and US west coast, are impacted more dramatically.
While there’s always some federation delay/backlog, it only escalates to noticeable levels every so often. Not sure why, but I’ve gone over our infrastructure top to bottom several times and cannot find any reason, on this end, for these events. The graph in the post shows the number of activities Dubvee is lagging behind Lemmy World over the last 30 days.
Normally, it’s 300-500 activities which usually corresponds to a minute or less of lag between LW sending an event and DubVee processing it. Occasionally, when LW gets busy, we see spikes into the 3000-5000 range (~5-7 minutes of lag). Every so often, though, there will be huge backlog events (the spikes from 15K to 35K) which often take 8-12 hours to catch up. A month or two ago, I think the largest spike was around 180,000 (though that was a separate issue).
I believe this is being addressed in Lemmy itself, but it’ll be a while before it’s ready (and I’m certainly going to let other instances kick the tires before upgrading).
In the mean time, I’ve talked with some other admins and have deployed a Federation proxy. I won’t go into the nitty-gritty details, but it does seem to be alleviating a lot, but not all, of the congestion. We went from averaging around 10,000 backlogged events to about 2500. So, definitely an improvement.
Update: Buffer has cleared, and things seem to be coming in pretty close to real time. Occasionally the buffer kicks in, but overall, it appears to be helping. Will continue to monitor. Usually 11-12 AM eastern time is when we start to see lag increasing in activities coming from Lemmy World.
Hopefully there’s some more optimization I can do in the mean time, and hopefully Lemmy addresses this limitation, but for now, this should make things less bad.