this post was submitted on 19 Feb 2024
32 points (100.0% liked)

PostgreSQL

618 readers
1 users here now

The world's most advanced open source relational database

Project
Events
Podcasts
Related Fediverse communities

founded 1 year ago
MODERATORS
 

Lemmy currently uses distinct tables like post_like: (post_id, person_id, score) and post_saved. Unfortunately this causes performance issues when we have to join many of these tables to create views.

One suggestion in this PR, is to combine these into a single post_action table, with a lot of optional columns depending on the action. This solution scares me a little, because I'm afraid we might lose data integrity, and many of our constraints with so many optional columns.

Is there a better way of doing this in SQL?

you are viewing a single comment's thread
view the rest of the comments
[–] xmunk 13 points 6 months ago (1 children)

Have you considered keeping the data writes to their current tables and deploying an indexed materialized view over those tables for more efficient reads? You can normalize your data and eat your denormalized cake too!

Let me know if you have any technical questions about how to.

[–] [email protected] 3 points 6 months ago (1 children)

We stopped using materialized views in lemmy a few years ago, because of caching / stale data issues.

[–] xmunk 6 points 6 months ago (1 children)

That's fair - though you can use trigger based refreshes, there will always be some kind of inconsistency window. That was my only real silver bullet suggestion. Otherwise, you'll just need to modify the underlying structure.

From what I've read of the change, it doesn't look too dangerous. My only concern would be around concurrent writes where someone upvotes and saves a post in rapid succession. The logic for both actions needs to support a post_action record existing or not existing and potentially starting to exist mid-write. There will be some ugly edge cases with that approach, but it should be doable if carefully done.

Sorry if that isn't super helpful, but thanks for your work maintaining the platform. It's appreciated!

[–] [email protected] 2 points 6 months ago (1 children)

No probs, thx for your help! We should be okay with writes in close succession, as long as it only updates that specific column. I think I'm coming around to the idea that a post_action table would be fine.

[–] xmunk 4 points 6 months ago

Yea, I've worked as a data architect - I share your knee-jerk fear of denormalization but I read the proposal over and I agree: it's dangerous if done sloppily but as long as you're careful it's do-able.

I've been working for a while on a product for medical professionals. It's an absolute blessing to my hair-line that Doctors consider anything faster than two minutes to essentially be instantaneous. Unfortunately the lemmy user base hasn't been dissuaded that a better world could exist by decades of horribly written software. Locks and setting aside a few dozen milliseconds for mat view refreshes are perfectly acceptable in my day job... but these darn Lemmy users expect a performant and stable product.

Just again though, thanks for the good work!