this post was submitted on 25 Aug 2023
22 points (100.0% liked)

LocalLLaMA

2178 readers
1 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 1 year ago
MODERATORS
 

Is it just memory bandwidth? Or is it that AMD is not well supported by pytorch well enough for most products? Or some combination of those?

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 2 points 1 year ago (1 children)

I run exllama on a 24GB GPU right now, just seeing what's feasible for larger models -- so an intel CPU with lots of RAM would in theory outperform an AMD iGPU with the same amount of ram allocated as VRAM? (I'm looking at APU/iGPUs solely because you can configure the amount of VRAM allocated to them.

[–] [email protected] 3 points 1 year ago (1 children)

I'm pretty sure it is not super relevant. The amount of vram in a GPU is different than the amount in a CPU. The system memory with x86 is mostly virtual bits. I haven't played in this space in awhile, and so my memory is rusty. The system memory is not directly accessible by an address bus. It creates a major bottleneck when you need to access a lot of information at once. It is more of a large storage system that is made to move chunks of data that are limited in size. If you want more info read about address buses and physical/virtual buses: https://en.m.wikipedia.org/wiki/Physical_Address_Extension

In a GPU, the goal is to move data in parallel where most of the memory is available at the same time. This doesn't have the extra overhead of complicated memory management systems. Each small processor is directly addressing the memory it needs. With a GPU, more memory usually means more physical compute hardware .

If you ever feel motivated to build vintage computing hardware like Ben Eater's 8 bit bread board computer project on YouTube, or his 6502 stuff, you'll see a lot of this first hand. The early 8 bit computer stuff is when a lot of this memory bus and address space was a major design aspect that is much more clear to understand because it is manually configured in hardware external to the processor.

[–] [email protected] 1 points 1 year ago

As per the link (YouTube) in the other thread, it seems like iGPU + increased allocation of VRAM is better than using the CPU, though it also seems APUs max out at 16GB. Maybe something AMD can improve in the future then...