this post was submitted on 21 Aug 2023
22 points (95.8% liked)

LocalLLaMA

3087 readers
5 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

founded 2 years ago
MODERATORS
 

They still consider it a beta but there we go! It's happening :D

top 7 comments
sorted by: hot top controversial new old
[–] Kerfuffle 4 points 2 years ago (1 children)

I was able to contribute a script (convert-llama-ggmlv3-to-gguf.py) to convert GGML models to GGUF so you can potentially still use your existing models. Ideally it should be used with the metadata from the original model since converting vocab from GGML to GGUF without that is imperfect. (By metadata I mean stuff like the HuggingFace config.json, tokenizer.model, etc.)

[–] [email protected] 3 points 2 years ago (1 children)

Is there any reason why support for loading both formats cannot be included within GGML/llama.cpp directly? As I understand it, the new format is basically the same as the old format but with extra metadata around the outside, and I don't see any reason why adding support for the new format necessitates removing support for the old format as the way that the actual model weightings is stored is not substantially different (if at all?).

This would allow people to continue loading their existing models without needing to convert them, or redownload them after waiting for someone else to convert them. The old format could be deprecated and eventually removed in a later release once people have had time to convert or once it becomes inconvenient to continue supporting it.

[–] Kerfuffle 3 points 2 years ago

Is there any reason why support for loading both formats cannot be included within GGML/llama.cpp directly?

It could be (and I bet koboldcpp and maybe other projects will take that route). There absolutely is a disadvantage to dragging around a lot of legacy stuff for compatibility. llama.cpp/ggml's approach has pretty much always been to favor rapid development over compatibility.

As I understand it, the new format is basically the same as the old format

I'm not sure that's really accurate. There are significant differences in how the model vocabulary is handled, for instance.

Even if it was true right now, in the very first version of GGUF that is merged it'll likely be less true as GGUF evolves and the stuff it enables starts getting used more. Having to maintain compatibility with the GGML stuff would make iterating on GGUF and adding new features more difficult.

[–] [email protected] 3 points 2 years ago

And now TheBloke is taking time off to convert all of his nearly 800 converted models?

[–] aSingularFemboyHooter 2 points 2 years ago (2 children)

Sorry, I'm trying to get in the loop on this stuff, what's the significance of this, and who will it effect?

[–] [email protected] 1 points 2 years ago

AFAIK supposedly GGUF is a more extensible format that contains (or can contain) more metadata types that make it usable for different model architectures. The main advantage is that this should be the last breaking format change, as future changes can be added in a more modular way.

[–] noneabove1182 1 points 2 years ago

The significance is we have a new file format standard, bad news is it breaks compatibility with the old format so you'll have to update to use newer quants and you can't use your old ones

The good news is this is the last time that'll happen (it's happened a few times so far) as this one is meant to be a lot more extensible and flexible, storing a ton of extra metadata for extra compatibility

The great news is that this paves the way for better model support as we've seen already with support for falcon being merged: https://github.com/ggerganov/llama.cpp/commit/cf658adc832badaaa2ca119fe86070e5a830f8f6