[Tom's Hardware] AMD 3D V-Cache enables RAM disk to hit 182 GB/s speeds - over 12X faster than the fastest PCIe 5.0 SSDs : hardware

[–] [email protected] 2 points 11 months ago

no way, cache and memory is faster than storage???

[–] [email protected] 1 points 11 months ago (2 children)

Imagine 8GB 3D V-cache. That would be glorious.

[–] [email protected] 1 points 11 months ago (4 children)

What use cases would this be good for?

[–] [email protected] 1 points 11 months ago (3 children)

Your entire game could be in the next floor up from the CPU cores, instead of in a metaphorical different city.

[–] [email protected] 1 points 11 months ago

Reminds me of RAM drives, but people mostly moved on from that since SSDs have gotten so incredibly fast and cheap in the past couple of years.

[–] [email protected] 1 points 11 months ago

Looking at Star Citizen install size now I just need a 14-socket motherboard

[–] [email protected] 1 points 11 months ago (1 children)

I seriously dubt that main memory latency and bandwidth is the performance bottleneck for many games. Even for loading it wouldn't be particularly useful compared to storing the game in RAM because now you'd be limited by PCIe bandwidth. Maybe with horribly optimized games that do a lot of random random reads during load it would help, but that's pushing it. Now the GPU side on the other hand could be interesting.

load more comments (1 replies)

[–] [email protected] 1 points 11 months ago

PCIe rebar currently uses system RAM to VRAM communication. 3dvcache to VRAM via DMA could be made possible without even accessing RAM, this would completely eliminate 50+ ns of RAM access latency (of course the necessary data needs to be already available in 3dvcache from system RAM before any of this fancy stuff happens).

[–] [email protected] 1 points 11 months ago

What use cases would this be good for?

Yes

[–] [email protected] 1 points 11 months ago

Game Dev in 2047: "why don't I just decompress all these textures I won't use for a while here"

[–] [email protected] 1 points 11 months ago (1 children)

Latency enters the chat

[–] [email protected] 1 points 11 months ago (1 children)

L4 cache is still much faster than RAM.

[–] [email protected] 1 points 11 months ago (1 children)

It's not that simple.

Having L4 at all increases latency to actual RAM.

load more comments (1 replies)

[–] [email protected] 1 points 11 months ago (2 children)

Time to play some old Playstation 1 RPGs with horrendous loading times all entirely stored on the L3 cache.

[–] [email protected] 1 points 11 months ago (2 children)

Imagine Factorio stored entirely in CPU

[–] [email protected] 1 points 11 months ago (1 children)

The Xeon Max CPUs contain 64GB of HBM2e, which can be configured to act as a cache. You could run a lot of games entirely on the HBM!

[–] [email protected] 1 points 11 months ago

Xeon isn't AMD 3D VCache

[–] [email protected] 1 points 11 months ago

The 7995WX already has 384 MB L3 cache.

I wouldn't be surprised if the next gen Thread ripper has 1GB L3.

[–] [email protected] 1 points 11 months ago (2 children)

Until you notice that the insane loading and save times are built into the engine and no SSD can ever change that.

I'm looking at you, Digimon World 2003.

[–] [email protected] 1 points 11 months ago

That is a crime worthy of a chair with a power current flowing.

[–] [email protected] 1 points 11 months ago

When I'm playing old games I sometimes wonder how we ever had the patience for it. Couldn't play them today if it wasn't for save state's.

[–] [email protected] 1 points 11 months ago (3 children)

https://en.wikipedia.org/wiki/Tiny_Core_Linux

There are <100MB Linux distributions. Is it theoretically possible to run an entire operating system without RAM, purely in CPU cache?

[–] [email protected] 1 points 11 months ago (2 children)

That's exactly what is done during bring up of new SoCs. Memory controllers are either non-functional in early prototypes or a miniature design is put into a bunch of FPGAs with only a single core and caches. The cache lines and TLB entries are primed and pinned with all relevant code and data pages before booting up a kernel.

[–] [email protected] 1 points 11 months ago (1 children)

On coreboot this boot method is called CAR, Cache as RAM, pretty interesting usage of cache to be honest, no need to add separate SRAM if you already have some

[–] [email protected] 1 points 11 months ago

The 7995WX has 384 MB L3 cache.

Imagine what you could do with that!

[–] [email protected] 1 points 11 months ago (1 children)

I think this is also what happens at boot on most systems before RAM is initialized, so maybe boot times could be faster if they took advantage of caches getting larger?

[–] [email protected] 1 points 11 months ago

Not sure if you meant to point out something else but initramfs or ramdisks are loaded on to RAM itself which is already up and running at that point. RAM initialization is usually initiated by early boot firmware and information about the physical address map is eventually passed on to the OS kernel which later sets up paging (virtual memory).

[–] [email protected] 1 points 11 months ago

Xeon Max will boot without any RAM installed at all. Though I'm not sure it counts, considering it has 64gb built into the cpu.

load more comments (1 replies)

[–] [email protected] 1 points 11 months ago (3 children)

182GB/s, for up to 32MB of data. It's an interesting study in misusing the tech, but it's ultimately a bit meaningless.

What we really need is for someone to modify the ramdisk driver to appear as usb storage and make it so it runs under Vista, so we can use it for ReadyBoost.

[–] [email protected] 1 points 11 months ago (1 children)

Why would it only be 32MB? This is the V-cache, not the L3.

[–] [email protected] 1 points 11 months ago

32MB is what they tested on the article.

To clarify a little on what's happening here, they're not using the v-cache as a memory space and making the volume there as you might create a partition on a conventional disk drive, but rather, they're accessing the ramdisk in such a way as to trick the system into keeping that it in cache. It's almost completely impractical in real terms, but it's a fun way to exploit the cache algorithm to get some silly numbers out of it.

[–] [email protected] 1 points 11 months ago

What we really need is for someone to modify the ramdisk driver to appear as usb storage and make it so it runs under Vista, so we can use it for ReadyBoost.

Use the RAM used as a ramdisk mimicking a disk drive as USB storage for Readyboost which uses a USB drive as...quasi-RAM?

This sounds like a circular way to do what RAM caching is already supposed to do haha, all modern operating systems do this already, used to call it Superfetch but now it's just commonplace and assumed, as well as not dumping things you close out of RAM immediately in case some parts of it get reused

[–] [email protected] 1 points 11 months ago

buy a copy of primocache. it's a great piece of software that adds multi-level cache to windows

[–] [email protected] 1 points 11 months ago

It would be awesome if Simone could invent SRAM speed disks that weren’t volatile. It would be a huge step forward for PCs for many things and we would stare at CPU/GPU makers as the bottle necks.

[–] [email protected] 1 points 11 months ago (3 children)

I would love to see the results of a 3D chip with a powerful iGPU. Not sure if it would work, but if it is possible, why is AMD not doing it? Would it cannibalise 100-200 eur GPU (they are already nonexistent anyway).

[–] [email protected] 1 points 11 months ago (2 children)

There is very little demand for a powerful iGPU desktop chip, so the ones that exist are derivatives of laptop chips and thus monolithic. So far there has not been a stacked cache monolithic die chip.

[–] [email protected] 1 points 11 months ago (2 children)

It's easy to say there no demand for something that doesn't exist. Sales are zero.

[–] [email protected] 1 points 11 months ago

Igpu is not apu

[–] [email protected] 1 points 11 months ago (2 children)

The laptop based desktop chips exist they are literally a thing and have been for a while. Both AMD and Intel have not seen high demand for those. Also even if that wasn't the case, your argument is not really an argument at all since it can just be used to justify literally anything that hasn't been tried.

[–] [email protected] 1 points 11 months ago

lol they aren't powerful, in this universe or any other

[–] [email protected] 1 points 11 months ago (1 children)

I do think this will change quickly if Qualcomm's ARM chips are as fast as the M2 Max like they claim. And there's reason to believe it, as they've bought/hired Apple's head of processor development.

Considering the M2 Max GPU is roughly equivalent to a 3080 mobile or a desktop 3060ti at significantly better efficiency, I think the demand for monolithic could explode practically overnight.

Assuming some x86 to ARM translation gets most things running.

[–] [email protected] 1 points 11 months ago (1 children)

Maybe Qualcomm would do so in the future, but as things stand now, it's not the case.

The iGPU in the Snapdragon X Elite is on the same ballpark as the regular M2. Not the Pro or Max variant.

In 3DMARK wildlife extreme, the X Elite GPU is 50% faster than Radeon 780M.

https://youtu.be/03eY7BSMc_c?si=HbhQPDt-AN_PP_TS

Still, that's nowhere near 3080 tier.

Qualcomm still needs to work on their Windows GPU drivers. Currently the only API the X Elite supports is DirectX12.

Some speculate that Qualcomm will eventually create Windows Vulkan driver for Adreno. And then use DXVK to support older DirectX versions, and use Zink to support OpenGL.

[–] [email protected] 1 points 11 months ago (2 children)

They already have a vulkan driver. The 3dmark runs were on vulkan

load more comments (2 replies)

load more comments (1 replies)

[–] [email protected] 1 points 11 months ago

We're still very far from that. Even mobile phones don't stack GPU, they only stack RAM and NAND. RAM and cache are far simpler to stack since they are simple things in nature. While GPU is unbelievably complicated compared to those 2. Maybe Intel's tile / AMD's chiplet system is closer to what we want, but it's still not as good as stacking.

[–] [email protected] 1 points 11 months ago

AMD is planning on the mi300 technically but that's for enterprise and will cost tens of thousands.

[–] [email protected] 1 points 11 months ago

Different amounts of stacked cache will be the next SKU differentiator and price gouger? Seems perfect for it.

[–] [email protected] 1 points 11 months ago

I remember when I tried ramdisking my modded Skyrim. It was the only way to remove cell transistion stutter, even though I had a 5950x, 64gb ram, 980pro and a 3090.

200gb+ v cache when AMD?