LocalLLaMA

2878 readers

13 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

founded 2 years ago

MODERATORS

Guide on setting up a local GGML model? (lemmy.world)

submitted 2 years ago* (last edited 2 years ago) by [email protected] to c/localllama

11 comments fedilink hide all child comments

I've been messing around with GPTQ models with ExLlama in ooba, and have gotten 33b models @ 3k running smoothly, but was looking to try something bigger than my VRAM can hold.

However, I'm clearly doing something wrong, and the koboldcpp.exe documentation isn't clear to me. Does anyone have a good setup guide? My understanding is koboldcpp.exe is preferable for GGML, as ooba's llama.cpp doesn't support GGML at >4k context yet.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 4 points 2 years ago* (last edited 2 years ago)

KoboldCpp has documentation on the github page. Maybe just google for other guides if the documentation doesn't do it for you.

My advice is: Do one step at a time. Get it running first, without fancy stuff. Start with a small model and without gpu acceleration. Then get the acceleration/CUDA working. Then try with a bigger model. And then you can do the elaborate stuff like having some layers in VRAM and others in RAM and blowing up the context size past 2048/default. Don't do it all at once. That way you might figure out your problem and at which of the steps it happens.

(Edit: And make sure to always use the latest version. You're playing with pretty recent stuff that still might have bugs.)

I can't say much about the windows stuff or the state of the integration layers in oobabooga's.