this post was submitted on 31 Jan 2025
15 points (89.5% liked)
LocalLLaMA
2531 readers
1 users here now
Community to discuss about LLaMA, the large language model created by Meta AI.
This is intended to be a replacement for r/LocalLLaMA on Reddit.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
At some point you'll run out of vram memory on the GPU. You make it slower by offloading some memory layers to make room for more context.
Yes, but if he's world building, a larger, slower model might just be an acceptable compromise.
I was getting oom errors doing speech to text on my 4070ti. I know (now) that I should have for for the 3090ti. Such is life.