LocalLLaMA

2590 readers

6 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago

MODERATORS

SkySyrup

pax

noneabove1182

Is there a good reason why AMD APUs just aren't used with massive amounts of (V)RAM just like the Mac M2 is? (lemmy.ca)

submitted 2 years ago by [email protected] to c/localllama

14 comments fedilink hide all child comments

Is it just memory bandwidth? Or is it that AMD is not well supported by pytorch well enough for most products? Or some combination of those?

you are viewing a single comment's thread
view the rest of the comments

[–] Naz 4 points 2 years ago* (last edited 2 years ago) (2 children)

I've gotten LLAMA running locally during CLBlast on an AMD GPU, and using the CPU simultaneously (basically APU execution pathway)

AMD is seriously slacking when it comes to machine learning, the hardware is Uber powerful, but just like everyone complains about, software isn't there.

ROCM doesn't even work on Windows, FFS.

You can run models on almost anything but the token generation is extremely slow. Like, you might be waiting upwards of 5 minutes for a response, or something like 0.2-0.6/tokens per second, which for a minimum of 100 tokens to be coherent is abysmal.

[–] [email protected] 3 points 2 years ago

Isn't windows for gaming and weird proprietary applications like photoshop?

[–] Kerfuffle 2 points 2 years ago (1 children)

If you're using llama.cpp, some ROCM stuff recently got merged in. It works pretty well, at least on my 6600. I believe there were instructions for getting it working on Windows in the pull.

[–] Naz 2 points 2 years ago

Thank you so much! I'll be sure to check that out / get it updated