this post was submitted on 22 Nov 2024
749 points (98.1% liked)
Comic Strips
12758 readers
3721 users here now
Comic Strips is a community for those who love comic stories.
The rules are simple:
- The post can be a single image, an image gallery, or a link to a specific comic hosted on another site (the author's website, for instance).
- The comic must be a complete story.
- If it is an external link, it must be to a specific story, not to the root of the site.
- You may post comics from others or your own.
- If you are posting a comic of your own, a maximum of one per week is allowed (I know, your comics are great, but this rule helps avoid spam).
- The comic can be in any language, but if it's not in English, OP must include an English translation in the post's 'body' field (note: you don't need to select a specific language when posting a comic).
- Politeness.
- Adult content is not allowed. This community aims to be fun for people of all ages.
Web of links
- [email protected]: "I use Arch btw"
- [email protected]: memes (you don't say!)
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Only the inference code of LLaMA (which runs the model) is open-source. The model itself is not, as you're given neither the training data, nor the model weights.
I don't know much about AI models, but that's still more than other vendors are giving away, right? Especially "Open"AI. A lot of people just care if they can use the model for free.
How useful would the training data be? Training of the largest Llama model was done on a cluster of over 100,000 Nvidia H100s so I'm not sure how many people would want to repeat that.
Open datasets are getting much better (Tulu for an instruct database/recipe is a great example), but its clear the giants still have "secret sauce" that gives them at least a small edge over open datasets.
There actually seems to be some vindication of using massively multilingual datasets as well, as the hybrid chinese/english models are turning out very good.
It turns out these clusters are being used very inefficiently, seeing how Qwen 2.5 was trained with a fraction of the GPUs and is clobbering models from much larger clusters.
One could say Facebook, OpenAI, X and such are "hoarding" H100s but are not pressured to utilize them efficiently since they are so GPU unconstrained.
Google is an interesting case, as Gemini is getting better quickly, but they presumably use much more efficient/cheap TPUs to train.
scientific institutions and governments could rent enough GPUs to train their own models, with potentially public funding and public accountability, and also it’d be nice to know if the data llama was trained with was literally just facebook user data. i’m not really in the camp of "if user content is on my site then the content belongs to me".
Without the same training data you wouldn't be able to recreate the results even when having the computing power. Thus it's not fully open source. Training data is a part of the source to create the result, "LLM". It's like having to add your own lines of code to open source program to make it work because the company doesn't provide it.
How on earth would you distribute the model for inference without the weights? The gradients are obviously gone so you can't continue training on the model. Maybe you can still do some kind of LORA?