I don't keep up with SSD benchmarks, but the mechanism behind this phenomenon is not anything mysterious. Most consumer SSDs ship with TLC or QLC NAND, which supports either 3 or 4 bits of storage per cell. However, writing the full 3 or 4 bits is slower than just writing one bit in each cell, so the drives use available empty NAND as an SLC write cache while the drive is still not full. So when you write to a relatively empty drive, your data will go into the DRAM cache on the controller (if there is any being used as a write buffer) and then get written into available NAND chips in SLC mode. Later on, the drive will consolidate the data down into QLC/TLC properly, but you get the advantage of a fast write so long as there is enough empty NAND to use SLC caching.
Obviously this falls apart once your drive gets close to full and there are no available empty NAND chips to write to as SLC cache. This is also why the write performance of budget drives tends to drop off worse than higher end drives. The nicer drives have faster NAND and usually have DRAM on the SSD controller to help performance in the worst case. Enterprise drives often sidestep this issue entirely via just using SLC or MLC NAND directly, or by having additional overprovisioning (extra NAND chips).
I can't speak to the details of the 990 Pro specifically, I don't keep up with individual drives that closely. I would guess that your understanding is correct, but someone else on the sub can probably chime in with the details for the current crop of high end drives.