this post was submitted on 14 Jun 2023
22 points (100.0% liked)
LocalLLaMA
2402 readers
1 users here now
Community to discuss about LLaMA, the large language model created by Meta AI.
This is intended to be a replacement for r/LocalLLaMA on Reddit.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
From the Twitter post
New StarCoder coding model from @WizardLM_AI
"WizardCoder-15B-v1.0 model achieves 57.3 pass@1 on the HumanEval Benchmarks .. 22.3 points higher than the SOTA open-source Code LLMs."
My quants: https://huggingface.co/TheBloke/WizardCoder-15B-1.0-GGML https://huggingface.co/TheBloke/WizardCoder-15B-1.0-GPTQ
Original: huggingface.co WizardLM/WizardCoder-15B-V1.0 · Hugging Face
11:21 AM · Jun 14, 2023
On The Bloke's hugging face repo, it says the GGML quants are not compatible with llama.cpp, anyone know why?
It's a different type of model. llama.cpp only supports LLaMA models while GGML (the machine learning library llama.cpp is based on) has examples of various models with different architectures. WizardCoder, MPT, Bloom, probably very soon Falcon. Also some separate projects use GGML to support other models (including some of the ones I listed). For example the Rust "llm" project can support LLaMA models, MPT, BLOOM.
Looks like gpt4all supports it, thought it was based on llama for some reason going to have to give it a try
It looks like a frontend that just bundles a bunch of stuff together. Oobabooga's webui thing is similar: you can run stuff with llama.cpp, GPTQ, etc. What models and features are supported is going to depend on how the frontend manages that stuff. There are also forks of llama.cpp like koboldc++ which may support different models/features/formats (I know koboldc++ supports some older GGML file formats that llama.cpp broke compatibility with).
Oh wait does ooba support this? Nvm then I'm enjoying using that, I'm just a little lost sometimes haha
I don't know if it does or doesn't, I was just saying those two projects seemed similar: presenting a frontend for running inference on models while the user doesn't necessarily have to know/care what backend is used.
Gotcha, koboldcpp seems to be able to run it, all of it is only a tiny bit confusing :D