this post was submitted on 31 Oct 2023

2 points (100.0% liked)

AMD

26 readers

1 users here now

For all things AMD; come talk about Ryzen, Radeon, Threadripper, EPYC, rumors, reviews, news and more.

founded 1 year ago

MODERATORS

[email protected]

2

Ditching CUDA for AMD ROCm for more accessible LLM training and inference. (medium.com)

submitted 1 year ago by [email protected] to c/[email protected]

11 comments fedilink hide all child comments

top 11 comments

sorted by: hot top controversial new old

[–] [email protected] 1 points 1 year ago

neat

[–] [email protected] 1 points 1 year ago (1 children)

Can you pass through amd gpus to containers? Like docker maybe

[–] [email protected] 1 points 1 year ago

Yep. I just did that the other day to run Stable Diffusion as I could not get it to install required drivers any other way.

[–] [email protected] 1 points 1 year ago (1 children)

"Ditching CUDA for AMD's shoddy take on CUDA."

[–] [email protected] 1 points 1 year ago

AMD should stop wasting money and put their support behind OpenVINO. It's already open source, and use optimized for both CPU and GPU for most large AI software.

https://www.phoronix.com/news/Intel-OpenVINO-2023.0

[–] [email protected] 1 points 1 year ago

I’m writing this entry mostly as a reference for those looking to train a LLM locally and have a Ryzen processor in their laptop (with integrated graphics), obviously it’s not the optimal environment, but it can work as a last resort or for at-home experimenting.

Oh.

[–] [email protected] 1 points 1 year ago

as a reference for those looking to train a LLM locally

It took me hours to finetune a small (for today's standards) BERT model with an RTX 4090, I can't imagine doing anything on chips like those referenced in the article, even inference.

I wouldn't do any training that's not at least with a 7800/7900 XTX, if you can get them to work.

[–] [email protected] 1 points 1 year ago

Maybe when AMD supports ROCm for more than like 3 consumer cards at at time.

CUDA supports cards going back many generations. AMD keeps cutting off cards that already had existing support -- Polaris has been cut for awhile, Vega/VII either just got cut or in is in the next release.

[–] [email protected] 1 points 1 year ago

Yeah gl with that, and the 5 very specific AMD cards it supports.

[–] [email protected] 1 points 1 year ago

Wow I only need to recompile everything 😂 Im a big amd fan, but scnr in this case

[–] [email protected] 1 points 1 year ago

It's gone quite a lot better but still cumbersome in many aspects. They need to get some dedicated maintainers in the main ML FOSS projects to make new ROCm versions available easily.

Every time a new ROCm version is released, it takes ~2 months to have all the stars lined up and builds available for the most commonly used LLM stacks.

Backported ROCm builds should be a thing to. Doesn't help that there's some ROCm7 Pytorch 2.2 nightly builds when most projects still use 2.1 and are stuck with ROCm6 (esp. when AMD devs essentially push you upward to solve any problem/crash you may have).

Also, although I completely understand the need to settle somewhere in terms of kernel / distro support when it comes to Linux, it's too bad their highest supported kernels are 6.2.x.