this post was submitted on 29 Jan 2024
63 points (90.9% liked)

LocalLLaMA

2268 readers
1 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 17 points 10 months ago (1 children)

And they don't provide the source... So it's neither open nor source. I get why and how Meta tries to make themselves look better. And I'm grateful for having access to such models. But I think words have meanings and journalists should do better than repeat that phrasing and help watering down the meaning of 'open source'. (Which technically doesn't mean free or without restrictions, but is often used synonymously.)

[–] planish 7 points 10 months ago (1 children)

Don't they provide the source for the code to actually run the model? Otherwise how are people loading it up and running it? Are they shipping executables along with model weights?

[–] [email protected] 5 points 10 months ago* (last edited 10 months ago)

What they mean by that is probably the fact that you can download the model, run it on your own hardware and adapt it. Contrary to what OpenAI does, who just offer a service and don't give access to the model itself, you can just use ChatGPT through their servers.

Most of the models come with a Github repo with code to run it and benchmarks. But it's more or less just boilerplate code to get it running in one of the well-established machine learning frameworks. Maybe a few customizations and the exact setup to get a new model architecture running. It would usually be something like Huggingface's Transformers library. There are a few other big projects which are used by people. If researchers come up with new maths, concepts and new architectures, it eventually gets implemented there.

But the code that gets released alongside new models it usually meant for scientific repeatability and not necessarily for actual use. It might contain customizations that make it difficult to incorporate it into other things, usually isn't maintained after the release and most of the times it is based on old versions of libraries, that were state of the art when they started with their research. So that's usually not what gets used by people in the end.

Interestingly enough companies all use different phrasing. Mistral AI claims to be commited to be "open & transparent" yet they like to drop torrent files to new models that come with zero explanation and code. And OpenAI still carries the word "open" in their company name, but at this point openness is more a hint of an idea from their very early days.

Anyways, inference code and the model aren't the same thing. It would be more like if we were talking about cake recipes and you provide me with the schematics of a kitchen aid.