LocalLLaMA

2531 readers

1 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago

MODERATORS

SkySyrup

pax

noneabove1182

What is a good model that runs on 6GB Vram? (discuss.online)

submitted 6 days ago by [email protected] to c/localllama

10 comments fedilink hide all child comments

Should be good at conversations and creative, it'll be for worldbuilding

Best if uncensored as I prefer that over it kicking in when I least want it

I'm fine with those roleplaying models as long as they can actually give me ideas and talk to be logically

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 1 points 6 days ago (2 children)

Can't you just increase context length at the cost of paging and slowdown?

[–] [email protected] 2 points 6 days ago (1 children)

At some point you'll run out of vram memory on the GPU. You make it slower by offloading some memory layers to make room for more context.

[–] [email protected] 1 points 6 days ago

Yes, but if he's world building, a larger, slower model might just be an acceptable compromise.

I was getting oom errors doing speech to text on my 4070ti. I know (now) that I should have for for the 3090ti. Such is life.

[–] [email protected] 1 points 6 days ago

At a certain point, layers will be pushed to RAM leading to incredibly slow inference. You don't want to wait hours for the model to generate a single response.