LocalLLaMA

2552 readers

10 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago

MODERATORS

SkySyrup

pax

noneabove1182

How much gpu do i need to run a 90b model (lemm.ee)

submitted 1 month ago by [email protected] to c/localllama

16 comments fedilink hide all child comments

Do i need industry grade gpu's or can i scrape by getring decent tps with a consumer level gpu.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 7 points 1 month ago* (last edited 1 month ago)

I'd say you're looking for something like a 80GB VRAM GPU. That'd be industry grade (an Nvidia A100 for example).

And to squeeze it into 80GB the model would need to be quantized to 4 or 5 bits. There are some LLM VRAM calculators available where you can put in your numbers, like this one.

Another option would be to rent these things by the hour in some datacenter (at about $2 to $3 per hour). Or do inference on a CPU with a wide memory interface. Like an Apple M3 processor or an AMD Epyc. But these are pricey, too. And you'd need to buy them alongside an equal amount of (fast) RAM.