this post was submitted on 05 Jun 2023
8 points (100.0% liked)

Lemmy

523 readers
2 users here now

Everything about Lemmy; bugs, gripes, praises, and advocacy.

For discussion about the lemmy.ml instance, go to [email protected].

founded 4 years ago
MODERATORS
 

With forewarning about a huge influx of users, you know Lemmy.ml will go down. Even if people go to https://join-lemmy.org/instances and disperse among the great instances there, the servers will go down.

Ruqqus had this issue too. Every time there was a mass exodus from Reddit, Ruqqus would go down, and hardly reap the rewards.

Even if it's not sustainable, just for one month, I'd like to see Lemmy.ml drastically boost their server power. If we can raise money as a community, what kind of server could we get for 100$? 500$? 1,000$?

all 48 comments
sorted by: hot top controversial new old
[–] [email protected] 2 points 1 year ago

It's a single server, Michael. What could it cost, $10?

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (1 children)

Based on looking at the code and the relatively small size of the data, I think there may be fundamental scaling issues with the site architecture. Software development may be far more critical than hardware at this point.

[–] [email protected] 0 points 1 year ago (1 children)

What are you seeing in the code that makes it hard do scale horizontally? I've never looked at Lemmy before, but I've done the steps of (monolithic app) -> docker -> make app stateless -> Kubernetes before and as a user, I don't necessarily see the complexity (not saying it's not there, but wondering what specifically in the site architecture prevents this transition)

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (1 children)

Right now it looks to me like Lemmy is built all around live real-time data queries of the SQL database. This may work when there are 100 postings a day and an active posting gets 80 comments... but it likely doesn't scale very well. You tend to have to evolve to a queue system where things like comments and votes are merged into the main database in more of a batch process (Reddit does this, you will see on their status page that comments and votes have different uptime tracking than the main website).

On the output side, it seems ideal to have all data live and up to the very instant, but it can fall over under load surges (which may be a popular topic, not just an influx from the decline of Twitter or Reddit). To scale, you tend to have to make some compromises and reuse output. Some kind of intermediate layer such as every 10 seconds only regenerate the output page if there has been a new write (vote or comment change).

don’t necessarily see the complexity (not saying it’s not there

It's the lack of complexity that's kind of the problem. Doing direct SQL queries gets you the latest data, but it becomes a big bottleneck. Again, what might have seemed to work fine when there were only 5000 postings and 100,000 total comments in the database can start to seriously fall over when you have reached the point of accumulating 1000 times that.

[–] [email protected] 1 points 1 year ago (1 children)

Out of curiosity how would https://kbin.social/ source: https://codeberg.org/Kbin/kbin-core stand up to this kind of analysis? Is it better placed to scale?

[–] [email protected] 0 points 1 year ago (1 children)

The advantage kbin has is that it is build on a pretty well known and tested php Symphony stack. In theory Lemmy is faster due to being built in Rust, but it is much more home-grown and not as optimized yet.

That said, kbin is also still a pretty new project that hasn't seen much actual load, so likely some dragons linger in its codebase as well.

[–] [email protected] 0 points 1 year ago (1 children)

I think it's probably undesirable to end up with big instances. I think the best situation might be one instance that's designed to scale. This could be lemmy.ml or another one. It can absorb these waves of new users.

However it's also designed to expire accounts after six months.

After three months it sends users a email explaining it's time to choose a server, it nags them to do so for a further three months. After that their ability to post is removed. They remain able to migrate their account to a new server.

After 12 months of not logging in the account is purged.

[–] [email protected] 1 points 1 year ago (1 children)

Thought on this a bit more, and I’m thinking encouraging users not to silo (and make it easy to discover instances and new communities) will probably be the best bet for scaling the network long-term.

“Smart” rate limiting of new accounts and activity per-instance might help with this organically. If a user is told that the instance they’re posting on is too active to create a new post, or too active to accept new users, and then is given alternatives, they might not outright leave.

[–] [email protected] 1 points 1 year ago (1 children)

That might work, is there some third party email app that could capture their email and let them know when registrations are open again? I know of some corporate/not privacy respecting ones such as https://kickofflabs.com/campaign-types/waitlist/ but presumably there's a way to do that with some on-site tools?

[–] [email protected] 2 points 1 year ago (1 children)

If the instance in question has email support, I don't see why the instance couldn't notify them directly - but I think providing alternative instances first (with the option to get notified if this instance opens up) would be more reasonable

[–] [email protected] 2 points 1 year ago

The idea would be to retain the ability to collect email addresses, beyond the point that the main app can't keep up. So you'd want something lightweight just for capturing the emails.

[–] [email protected] 1 points 1 year ago (6 children)

The site currently runs on the biggest VPS which is available on OVH. Upgrading further would probably require migrating to a dedicated server, which would mean some downtime. Im not sure if its worth the trouble, anyway the site will go down sooner or later if millions of Reddit users try to join.

[–] [email protected] 1 points 1 year ago (1 children)
8 vCore
32 GB RAM

😬

2 follow-ups:

  • Can we replace Lemmy.ml with Join-lemmy.org when Lemmy.ml is overloaded/down?
  • Does LemmyNet have any plans on being Kubernetes (or similar horizontal scaling techniques) compatible?
[–] [email protected] 0 points 1 year ago (1 children)

Can we replace Lemmy.ml with Join-lemmy.org when Lemmy.ml is overloaded/down?

I dont think so, when the site is overloaded then clients cant reach it at all.

Does LemmyNet have any plans on being Kubernetes (or similar horizontal scaling techniques) compatible?

It should be compatible if someone sets it up.

[–] [email protected] 0 points 1 year ago (1 children)

You could configure something like a Cloudflare worker to throw up a page directing users elsewhere whenever healthchecks failed.

[–] [email protected] 0 points 1 year ago (1 children)

Then cloudflare would be able to spy on all the traffic so thats not an option.

[–] [email protected] -2 points 1 year ago (1 children)

spy on all the traffic

That's...not how things work. Everyone has their philosophical opinions so I won't attempt to argue the point, but if you want to handle scale and distribution, you're going to have to start thinking differently, otherwise you're going to fail when load starts to really increase.

[–] [email protected] 1 points 1 year ago

There will either be an hour of downtime to migrate and grow or days of downtime to fizzle.

I love that there's an influx of volunteers, including SQL experts, to mitigate scaling issues for the entire fediverse but those improvements won't be ready in time. Things are overloading already and there's less than a week before things increase 1,000-fold, maybe more.

[–] [email protected] 1 points 1 year ago (1 children)

Do you have the frontend a DB serving in the same VPS? If so it would be a great time to split them. Likewise if you DB is running in a VPS, you're likely suffering from significant steal from the hypervisor so you would benefit from switching to a dedicated box. My API calls saw a speedup of 10x just from switching from a VPS DB to a Dedicated Box DB.

I just checked OVH VPS offers and they're shit! Even at 70 Eur dedicated on hetzner, you would gain more than double those resources without steal. I would recommend switching your DB ASAP for immediate massive gains.

If you're wondering why you should listen to me, I built and run https://aihorde.net and are handling about 5K concurrent connections currently.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (1 children)

Hetzner is very strict about piracy so thats not an option. And now is almost weekend so I wont have time for a migration. Anyway there are plenty of other instances in case lemmy.ml goes down.

Edit: I also wouldnt know which size of dedicated server to choose. No matter what I pick, it will get overloaded again after a week or two.

[–] [email protected] 1 points 1 year ago

Even if you choose Hetzner, it won't even know it has anything to do with piracy because it will be just hosting the DB, and nobody will know where your DB is. That fear is overblown.

Likewise believe me a dedicated server is night and day from a VPS.

[–] [email protected] 0 points 1 year ago (1 children)

So reading this correctly, it's currently a hosting bill of 30 Euro a month?

[–] [email protected] 0 points 1 year ago* (last edited 1 year ago) (1 children)

No, thats the 8 GB memory option... if its the biggest, it should be around 112 €. Meanwhile i keep wondering if i should let Lemmy stay on the current KVM (which is similarely specked but with dedicated cores and stuff) or if it is better to move it to one of my dedis just in case... well... will see xD

[–] [email protected] 0 points 1 year ago (2 children)

Its the one for 30 euros, Im not seeing any vps for 112. Maybe thats a different type of vps?

[–] [email protected] 0 points 1 year ago (1 children)

in vservers, it depends on the memory … and storage option for the one starting at 30…

[–] [email protected] 0 points 1 year ago (1 children)

It currently has 8gb and only uses 6gb or so. CPU is the only limitation.

[–] [email protected] 0 points 1 year ago (1 children)

It does not sound like OVHs vServers offer dedicated cores, and it is common to quickly become a bottleneck with VPS offerings across hosters and for example with the initial Mastodon hypes, i had to learn that shared hardware lesson the hard way. For the price you are currently paying, maybe something like a used dedicated (or one of the fancy AMD ones) server at Hetzner is of interest: https://www.hetzner.com/sb

[–] [email protected] 0 points 1 year ago (1 children)

Hetzner is great but they are very strict about piracy, so its not an option for lemmy.ml. For now the load has gone down so I will leave it like this, but a dedicated OVH server might be an option if load increases again.

[–] [email protected] 1 points 1 year ago

You should use this relatively quiet time to migrate to a larger server, because when the time comes where you need to do it, you're going to be in for a world of hurt. This is the calm before the storm--take advantage of it.

Ultimately, you need to scale horizontally. You need to shard your database and separate out your different functions (database, front end, whatever back end applications you use, etc) onto different servers, all fronted by load balancers. That's going to be the only way to even begin to handle increasing load. If you don't have a small team of experienced engineers with a deep understanding of how to build for scale, and you get a sudden mass exodus of users from Reddit, you're fucked. So if I were you, here's what I'd do:

  1. Scale up to the largest instance type you can. If possible, switch (at least temporarily) to AWS and use something in the c6i instance family, such as the c6id.32xlarge. Billing for AWS instances is done by the hour, so you wouldn't need to pay for an entire month up front if you only need that extra horsepower for a few days (such as when the blackouts are planned from the 12th through 14th).

  2. Because the above will do nothing but buy you time until you crash--and if you get a huge spike of users, without horizontal scaling, you WILL crash--migrate your DNS to something like Cloudflare. From there, configure workers to respond when health checks to your site fail, so that users attempting to access the site can be shown a static page directing them to something like http://join-lemmy.org or someplace, instead of simply getting 5xx errors.

  3. Once the hug of death is over, evaluate where you stand. Reduce your instance size, if you can, and start investigating what it's going to take to scale horizontally.

I'm not a SQL expert, but I am a principal network architect, and my day job for the last 15 years has been working on scale and automation for the world's largest companies, including 7 years spent at AWS. In my world, websites like Reddit, as large as they are, are still considered to be of 'average' size. I can't help you with database, but I'm happy to provide guidance around networking, DNS, scale, automation, security, etc.

[–] [email protected] 0 points 1 year ago (2 children)

What's the current bottleneck?

[–] [email protected] 0 points 1 year ago (1 children)

And may be the bandwidth. Serve thousands and thousands need at minimum 1gbps.

[–] [email protected] 1 points 1 year ago

Its mostly text so bandwidth shouldnt be a problem.

[–] [email protected] 0 points 1 year ago (1 children)

SQL. We desperately need SQL experts. It's been just me for yeRs, and my SQL skills are pretty terrible.

[–] [email protected] 1 points 1 year ago

Put the whole DB in RAM :-)

Makes me remember optimization, lots of EXPLAIN and JOIN pain, on my old MySQL multiplayer game server lol. A shame I'm not an expert ...

[–] [email protected] 0 points 1 year ago

Is it running in a single docker container or is it spread out across multiple containers? Maybe with docker-machine or kubernetes with horizontal scaling, it could absorb users without issue - well, except maybe cost. OVH has managed kubernetes.

[–] [email protected] 1 points 1 year ago

I'm in the process of setting up a raspberry pi to host an instance, but the documentation is not super helpful. I'll slog through it and issue a PR once co.plete so others may not have the same issues.

[–] [email protected] 1 points 1 year ago

what kind of server could we get for 100$? 500$? 1,000$?

And how long? Days, Months...?

[–] [email protected] 0 points 1 year ago (1 children)

I'm also willing to donate to other instances too - Beehaw, Sopuli, Lemmygrad, Lemmyone - Anything so we can have better shock absorption. If you run one of those instances, please reply and let us know how much you think you need

[–] [email protected] 0 points 1 year ago (1 children)

At the moment, I run lemmy.world using the funding of mastodon.world. If Lemmy.world might grow and need a dedicated server, I'll try to raise funds for it separately (or create a larger .world fundraiser as I have other instances as well)

[–] [email protected] 0 points 1 year ago (1 children)

What kind of server and which specs is lemmy.world running on? I'm planning on setting up my own instance for a small community, but I have no idea what to brace for.

[–] [email protected] 1 points 1 year ago (1 children)

Currently a 2 cpu 4GB VPS at Hetzner, costing 5 EUR per month. With a storage volume of also 5 per month. I am monitoring this and will scale when needed. For mastodon.world we scaled it to a dedicated server with 32 cores and 256GB so we can go a long way.

[–] [email protected] 0 points 1 year ago (1 children)

Can you provide any info about the number of pageviews/month or pageviews/hr that setup can support for lemmy?

[–] [email protected] 0 points 1 year ago (1 children)

No, I have nu clue about that yet. I'll monitor how Lemmy behaves and try to scale. Maybe after a while I can say something about it.

[–] [email protected] 1 points 1 year ago

Thanks, appreciate the data point.