overview for DuranteA

'GPUs still rule' asserts graphics guru Raja Koduri in response to custom AI silicon advocate in c/[email protected]

[–] [email protected] 1 points 9 months ago (2 children)

Specialized hardware can make sense for inference of known networks, or actually a bit more broadly, network structures. But for training and research, the structure of models still seems to be in too much flux for specialization much beyond the level of a modern GPU to make sense. For now at least.

The (research) ecosystem would have to settle down a it before you can get a really substantial improvement out o specialization, and be confident that your architecture can still run the latest and greatest in ~2 years after you designed it.

Dell reportedly restricts exports of AMD's fastest gaming GPUs to China — Radeon RX 7900 XTX, RX 7900, Pro W7900 purportedly listed as sanctioned tech in c/[email protected]

[–] [email protected] 1 points 10 months ago (1 children)

This is no longer true.
If you use NV's TensorRT plugin with the A1111 development branch, TensorRT works very well with SDXL (it's actually much less painful to use than SD1.5 TensorRT was initially).

The big constraint is VRAM capacity. I can use it for 1024x1024 (and similar-total-pixel-count) SDXL generations on my 4090, but can't go much beyond that without tiling (though that is generally what you do anyway for larger resolutions).

Just like for SD1.5, TensorRT speeds up generation by almost a factor of 2 for SDXL (compared to an "optimized" baseline using SDP).

Nvidia, Intel claim new LLM training speed records in new MLPerf 3.1 benchmark in c/[email protected]

[–] [email protected] 1 points 10 months ago

The benchmarks test workloads people are actually interested in running.

Whether good performance is achieved by good hardware, a good software stack, or both doesn't matter to the vast majority of buyers.