this post was submitted on 28 Nov 2023

1 points (100.0% liked)

C Sharp

1526 readers

1 users here now

A community about the C# programming language

Getting started

Useful resources

IDEs and code editors

Visual Studio (Windows/Mac)
Rider (Windows/Mac/Linux)
Visual Studio Code (Windows/Mac/Linux)

Tools

Decompilers: ILSpy, dotPeek
Scratchpad: LINQPad
Online playground and IL viewer: SharpLab

Rules

Rule 1: Follow Lemmy rules
Rule 2: Be excellent to each other, no hostility towards users for any reason
Rule 3: No spam of tools/companies/advertisements

Related communities

c/dotnet

founded 2 years ago

MODERATORS

[email protected]

Is preloading/caching data before the actual method call an (anti)pattern? (programming.dev)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]

10 comments fedilink hide all child comments

Short explanation of the title: imagine you have a legacy mudball codebase in which most service methods are usually querying the database (through EF), modifying some data and then saving it in at the end of the method.

This code is hard to debug, impossible to write unit tests for and generally performs badly because developers often make unoptimized or redundant db hits in these methods.

What I've started doing is to often make all the data loads before the method call, put it in a generic cache class (it's mostly dictionaries internally), and then use that as a parameter or a member variable for the method - everything in the method then gets or saves the data to that cache, its not allowed to do db hits on its own anymore.

I can now also unit test this code as long as I manually fill the cache with test data beforehand. I just need to make sure that i actually preload everything in advance (which is not always possible) so I have it ready when I need it in the method.

Is this good practice? Is there a name for it, whether it's a pattern or an anti-pattern? I'm tempted to say that this is just a janky repository pattern but it seems different since it's more about how you time and cache data loads for that method individually, rather than overall implementation of data access across the app.

In either case, I'd like to learn either how to improve it, or how to replace it.

top 10 comments

sorted by: hot top controversial new old

[–] [email protected] 2 points 1 year ago (2 children)

If testing this properly is your problem you should invest time in integration testing, running them on an in-memory database is an option as well. I think retrieving all the data and “caching” it like you call it has some negative consequences, for example what if the validation for some action fails and you didn’t need to load whatever you preloaded? Waste of a call to the db

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago)

You're right that this could introduce regressions, but it sounds like it's making more testable.

My biggest concern would be introducing db contention with locks being held for too long, and introducing race conditions because the cached data isn't locking the records when they're cached.

Edit: your->you're

[–] [email protected] 1 points 1 year ago (1 children)

Validation is usually the first step so I only start preloading after it's done of course, but you are right - you can easily end up loading more data than it necessary.

However, it can also result in fewer overall queries - if I load all relevant entities at the beginning then later I won't have to do 2+ separate calls to get relevant data perhaps. For example, if I'm processing weather for 3 users, I know to preload all 3 users and weather data for the 3 locations where they live in. The old implementation could end up loading 3 users, then go into a loop and eventually into a method that processes their weather data and do 3 separate weather db hits for each of the users (this is a simplified example but something that I've definitely seen happen in more subtle ways).

I guess I'm just trying to find a way to keep it a pure method with only "actual logic" in it, without depending on a database. Forcing developers to think ahead about what data they actually need in advance also seems like a good thing maybe.

[–] [email protected] 1 points 1 year ago

Forcing developers to think ahead about what data they actually need in advance also seems like a good thing maybe.

It does.

[–] [email protected] 2 points 1 year ago (1 children)

How does the caching work? If the method is called again with the same parameters, does it load from the cache or fetch the data from the database into the cache again?

[–] [email protected] 1 points 1 year ago

Fetches from the database again, it's just a temporary bundle of data rather than a persistent cache. We have caching for commonly-read/rarely-updated entities but its not feasible for everything ofc.

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago)

I'm not sure how you do it at the moment or already know this since you mention repository pattern. But here's how I know.

A pattern I encountered at my workplace is a split between the repository and the data access (Dao) layer.

The repository implements an interface which other parts of your program uses to get data. The repository askes the data access layer to make database calls.

For testing other parts of the programs, we mock the repository interface, and implement simple returns of data instead of relying on the database at all. Then we have full control of what goes in and out of your legacy code, assuming you are able to use this.

For testing the dao, I don't actually have much experience since that's not a good option for us at the moment, but as others mentioned you can use in memory databases or manually mock the connection object provided to the dao to test that your save methods store the correct data. The latter being somewhat clunky in my experience but the best option when you are trying to save something with 20 values and making sure they end up in the right order or have the right values when converting enum values to strings for example.

I don't know much about cache, but if you want to keep it then it's possible to do it in the repository class.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (1 children)

If you ignore the caching, the approach you're describing loosely aligns with the concept of Domain-Driven Design (DDD). In DDD, the model is loaded before any business logic is executed, and then any changes made to the model are persisted back to the database.

[–] [email protected] 1 points 1 year ago (1 children)

In DDD, the model is loaded before any business logic is executed

That's really not a DDD requirement. Having a domain model does not require you to preload data to run business logic. For example, you can easily have business logic that only takes as input a value object and triggers a usecase, and you do not need to preload anything to instantiate a value.

[–] [email protected] 1 points 1 year ago

Agree.

I’m just saying OP is loading stuff into a dictionary that perhaps function as a Domain Model. Then they pass this Domain Model to a Use Case, where it gets modified and saved to a database.

OP was asking for an architecture name or design pattern, and while it's not a perfect match, it's kinda like a Domain Model, although an anemic one.

None of this is a DDD requirement.