this post was submitted on 11 Jul 2023
16 points (100.0% liked)
LocalLLaMA
2878 readers
13 users here now
Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.
Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.
As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
KoboldCpp has documentation on the github page. Maybe just google for other guides if the documentation doesn't do it for you.
My advice is: Do one step at a time. Get it running first, without fancy stuff. Start with a small model and without gpu acceleration. Then get the acceleration/CUDA working. Then try with a bigger model. And then you can do the elaborate stuff like having some layers in VRAM and others in RAM and blowing up the context size past 2048/default. Don't do it all at once. That way you might figure out your problem and at which of the steps it happens.
(Edit: And make sure to always use the latest version. You're playing with pretty recent stuff that still might have bugs.)
I can't say much about the windows stuff or the state of the integration layers in oobabooga's.