ticoombs

joined 1 year ago
MODERATOR OF
[–] [email protected] 1 points 5 months ago

Possibly fixed, I tweaked the caching mechanism to specify if it was html and json. At least that's what it should do!
If you manage to see it again left me know.

[–] [email protected] 3 points 5 months ago

Thanks for letting me know!

An interesting issue indeed! I'll have to check it out. I have a sneaking suspicion it might be due to the nginx caching... πŸ€”

[–] [email protected] 3 points 5 months ago

πŸ₯³πŸ₯³πŸ₯³

[–] [email protected] 4 points 5 months ago (1 children)

I have, I've got a couple accounts on sone merch stores that will allow everyone to buy a few things. Such as stickers, mugs, t-shirts etc. But we need some good graphics. The Reddthat logo by itself only works for the stickers.

My ideas were:

  • Sticker: logo
  • Mug: logo + text & BIG text
  • Tshirt: big graphic on front, or lots of random text on the front with "I've readthat" on the back
  • Hoodie: small text on front + big logo/graphic on back

But I still need that big graphic design and some time to create all the other logos

[–] [email protected] 4 points 5 months ago

You're not wrong.
I remember back when I deposited money into a random bank account and some internet person sent me internet money.

The internet really is an amazing place

[–] [email protected] 3 points 5 months ago (1 children)

Congratulations! πŸ‘ Happy B-day and here's to many more to my across the river friends πŸŽ‰

Ps. The video works and is great!

[–] [email protected] 2 points 5 months ago

old save comment

[–] [email protected] 4 points 5 months ago (8 children)

No it doesn't (unfortunately). We have a proxy in the EU close to LemmyWorld which batch sends requests to our local queue, which then inserts them into Lemmy. We can sustain over 30/s and our db barely struggles. The DB isn't the problem as we have a stupid amount of resources at our disposal.
The majority of the time (in my experience with these issues) is that the developers are making changes to how Lemmy works and it inadvertently goes from a query that performed super fast, to a query that is not optimised.

Because we are on the beta tags (for this release only!) we have to weather the storm, but as soon as 0.19.4 is released we'll go back to only using the production ready releases instead of the beta ones.

I have some time today to do a thorough investigation now so I'll find the bad query, make an issue, and if it's a simple fix it will be rolled out within 24/48 hours.

[–] [email protected] 4 points 5 months ago (10 children)

Yep, it is the same issue as: https://reddthat.com/post/19658926

There is some Lemmy issue that's been causing this. I've updated to rc-3 just now. So let's hope that fixes it. (It probably won't)

[–] [email protected] 2 points 5 months ago (6 children)

Instant* :( I hit the bug here. :( will have to do some debugging.

[–] [email protected] 1 points 5 months ago (8 children)

Is pretty instance

[–] [email protected] 1 points 5 months ago (9 children)

Replying directly

 

Now that is a Gaming Router

 

Just got the server back up into a working state, still investigating why it decided to kill itself.

Unfortunately dealing with $job all day today with things randomly breaking like this. I'm guessing leap year shenanigans, but have no way in knowing that yet.

Just wanted to let you all know it's back up!

103
submitted 8 months ago* (last edited 8 months ago) by [email protected] to c/[email protected]
 

Video link for those on clients who don't show links when they are videos: https://i.imgur.com/5jtvxPQ.mp4

9
submitted 8 months ago* (last edited 8 months ago) by [email protected] to c/[email protected]
 

Unfortunately we had a Valentine's Day outage of around 2 hours.

Incident Timeline: (times in UTC)

04:39 - Our monitoring sees a 50x error.
04:41 - I am alerted via email & phone.
04:48 - I acknowledge the incident and start investigating
04:50 - I cannot access the VM via SSH. I issue a reboot via our control panel.
04:54 - Our server has a load of 12 and an 57% of all IO operations are IOWait.
05:30 - I issue another reboot and can't seem to figure out what's wrong
05:58 - I lodge a ticket with our provider to check the host, and to power off and on again as we still have huge IOWait values, and 100% Memory usage.
06:30 - hosting company hasn't got back to me and I start investigating by rolling back the latest configuration changes I've done & reboot.
06:35 - sites are back online.

Resolution

Latest change included turning on huge pages with a value of 100MB to allow postgres to get some performance gains.
This change was done on Monday morning and I had planned to do a power cycle this week to confirm everything was on the up-and-up. Turns out my host did that for me.

The outage lasted longer than it should have due to some $job and $life.

Until next time,
Cheers,
Tiff

 

Happy leap year! February comes with a fun 29th day? What will we do with the extra 24 hours? Sorry I missed the January update but I was too busy living the dream in cyberspace.

Community Funding

As always, I'd like to thank our dedicated community members who keep the lights on. Unfortunately our backup script went a little haywire over this last couple months and it resulted in a huge bill regarding our object storage! This bug has been fixed and new billing alerts have been updated to ensure this doesn't happen again.

If you are new to Reddthat and have not seen our OpenCollective page it is available over here. All transactions and expenses are completely transparent. If you donate and leave your name (which is completely optional) you'll eventually find your way into our main funding post over here!

Upcoming Postgres Upgrade

In the admin matrix channels there has been lots of talk about database optimisations recently as well as the issues relating to out of memory (OOM) errors. These OOM issues are mostly in regards to memory "leaks" and is what was plaguing Reddthat on a weekly basis. The good news is that other instance admins have confirmed that Postgres 16 (we are currently on Postgres 15) fixes those leaks and performs better all around!

We will be planning to upgrade Postgres from 15 to 16 later on this month (February). I'm tentatively looking at performing it during the week of the 18th to 24th. This will mean that Reddthat will be down for the period of the maintenance. I expect this to take around 30 minutes, but further testing on our development machines will produce a more concrete number.

This "forced" upgrade comes at a good time. As you may or may not be aware by our uptime monitoring we have been having occasional outages. This is because of our postgres server. When we do a deploy and make a change, the postgres container does not shutdown cleanly. So when it restarts it has to perform a recovery to ensure data consistency. This recovery process normally requires about 2-3 minutes where you will see an unfortunate error page.
This has been a sore spot with me as it is most likely an implementation failure on my part and I curse to myself whenever postgres decides to restart for whatever reason. Even though it should not restart because I made a change on a separate service. I feel like I am letting us down and want to do better! These issues leads us into the next section nicely.

Upcoming (February/March) "Dedicated" Servers

I've been playing with the concept of separating our services for a while now. We now have Pictrs being served on different server but still have lemmy, the frontends and the database all on our big single server. This single server setup has served us nicely and would continue to do so for quite some time but with the recent changes to lemmy adding more pressure onto the database we need to find a solution before it becomes an even bigger problem.
The forced postgres upgrade gives us the chance to make this decision and optimise our servers to support everyone and give a better experience on a whole.

Most likely starting next month (March) we will have two smaller front-end servers, which will contain the lemmy applications, a smallish pictrs server, and a bigger backend server to power the growing database. At least that is the current plan. Further testing may make us re-evaluate but I do not forsee any reason we would change the over-arching aspects.

Lemmy v0.19 & Pictrs v0.5 features

We've made it though the changes to v0.19.x and I'd like to say even with the unfortunate downtimes relating to Pictrs' temp data, we came through with minor downtimes and (for me) better appreciation of the database and it's structure.

Hopefully everyone has been enjoying the new features. If you use the default web-ui, you should also be able to upload videos, but I would advise against it. There is still the limit of the 10 second request regarding uploads, so unless your video is small and short it won't complete in-time.

Closing

It has been challenging over the past few months and especially with communities not as active as they once were. It almost seems that our little instance is being forgotten about and other instances have completely blown up! But it is concerning that without 1 or 2 people our communities dry up.

As an attempt to breathe life into our communities, I've started a little Community Spotlight initiative. Where every 1-2 weeks I'll be pinning a community that you should go and checkout!

The friendly atmosphere and communities is what make me want to continue doing what I do in my spare time. I'm glad people have found a home here, even if they might lurk and I'll continue trying to make Reddthat better!

Cheers,
Tiff
On behalf of the Reddthat Admin Team.

 

I wonder if this new system is why I can't make slack remind me at weird times...

 

Practical attacks with a Raspberry Pi.

 

It is always interesting to read about other people's experiences with k8s.

Archive Mirror for those that hate medium: https://archive.is/sQcHH

Off topic:
The amount of 'please login to read the rest of the article' popup blocks is insane now. They must be really trying to make money...

 

An interesting talk

view more: β€Ή prev next β€Ί