LocalLLaMA

2318 readers

4 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago

MODERATORS

SkySyrup

pax

noneabove1182

How do I get LLaMA going on a GPU? (self.localllama)

submitted 2 years ago by planish to c/localllama

2 comments fedilink hide all child comments

Everyone is so thrilled with llama.cpp, but I want to do GPU accelerated text generation and interactive writing. What's the state of the art here? Will KoboldAI now download LLaMA for me?

top 2 comments

sorted by: hot top controversial new old

[–] SkySyrup 3 points 2 years ago* (last edited 2 years ago)

Hi, I'm happy to see you are willing to give llama a try! If you want to do GPU-Accelerated processing, it depends on your OS and Hardware what you are able to do. If you have a Nvidia card, you will be able to use cuBLAS, instructions here: https://github.com/ggerganov/llama.cpp#cublas . I don't have experience with other cards, but I'll try to help if issues arise!

Also, for more ease-of-use try text-generation-webui (https://github.com/oobabooga/text-generation-webui). Well, ease-of-use, until you can want to use GPU acceleration, because you'll need to look at https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md#gpu-acceleration if you want to do that with LLaMA.

33B and 65B models seem to be the best for storytelling and writing.

[–] [email protected] 2 points 2 years ago

there's a bit more setup involved but I would look into https://github.com/oobabooga/text-generation-webui