this post was submitted on 12 Jun 2024
8 points (100.0% liked)
LocalLLaMA
2262 readers
1 users here now
Community to discuss about LLaMA, the large language model created by Meta AI.
This is intended to be a replacement for r/LocalLLaMA on Reddit.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I think most people use something like exllamav2 or vllm or use GGUF to do inference and it seems neither of those projects have properly implemented multimodality or this specific model architecture, yet.
You might just be at the forefront of things and there isn't yet any beaten path you could follow.
The easiest thing you could do is just use something that already exists, be it 4bit models, wait a few weeks and then upgrade. And I mean you can also always quantize models yourself and set the parameters however you like, if you have some inference framework that supports your model including the adapters for vision and has the quantization levels you're interested in...