LocalLLaMA

3206 readers

1 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago

MODERATORS

Meta releases ‘Code Llama 70B’, an open-source behemoth to rival private AI development (venturebeat.com)

submitted 1 year ago by [email protected] to c/localllama

15 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 21 points 1 year ago (2 children)

I don't like to sound like a broken clock, but all the llama models have restrictions on their use that mean they aren't open source.

[–] [email protected] 18 points 1 year ago (1 children)

And they don't provide the source... So it's neither open nor source. I get why and how Meta tries to make themselves look better. And I'm grateful for having access to such models. But I think words have meanings and journalists should do better than repeat that phrasing and help watering down the meaning of 'open source'. (Which technically doesn't mean free or without restrictions, but is often used synonymously.)

[–] planish 7 points 1 year ago (1 children)

Don't they provide the source for the code to actually run the model? Otherwise how are people loading it up and running it? Are they shipping executables along with model weights?

[–] [email protected] 5 points 1 year ago* (last edited 1 year ago)

What they mean by that is probably the fact that you can download the model, run it on your own hardware and adapt it. Contrary to what OpenAI does, who just offer a service and don't give access to the model itself, you can just use ChatGPT through their servers.

Most of the models come with a Github repo with code to run it and benchmarks. But it's more or less just boilerplate code to get it running in one of the well-established machine learning frameworks. Maybe a few customizations and the exact setup to get a new model architecture running. It would usually be something like Huggingface's Transformers library. There are a few other big projects which are used by people. If researchers come up with new maths, concepts and new architectures, it eventually gets implemented there.

But the code that gets released alongside new models it usually meant for scientific repeatability and not necessarily for actual use. It might contain customizations that make it difficult to incorporate it into other things, usually isn't maintained after the release and most of the times it is based on old versions of libraries, that were state of the art when they started with their research. So that's usually not what gets used by people in the end.

Interestingly enough companies all use different phrasing. Mistral AI claims to be commited to be "open & transparent" yet they like to drop torrent files to new models that come with zero explanation and code. And OpenAI still carries the word "open" in their company name, but at this point openness is more a hint of an idea from their very early days.

Anyways, inference code and the model aren't the same thing. It would be more like if we were talking about cake recipes and you provide me with the schematics of a kitchen aid.

[–] [email protected] 1 points 1 year ago (1 children)

How well do the OpenLlama models perform against Llama2? AIUI the training data uses for OpenLlama is the same?

[–] [email protected] 7 points 1 year ago* (last edited 1 year ago)

The training data for OpenLlama is called RedPajama if I'm not mistaken. And a reproduction of what Meta used to train the first LLaMA. Back then they listed the datasets in the scientific paper. Nowadays they and their competitors don't do that anymore.

OpenLlama performs about as good (slightly worse) as the first official LLaMA. And both perform worse than Llama2. It's not day and night, but i think a noticeable improvement. And Llama2 has twice the context length which is a huge improvement for some use-cases.

If you're looking for models with a different license, there are some more. Mistral is Apache 2.0 and there are several more with permissive licenses.

If you're looking for info on what datasets the big players use, forget it (my opinion). The companies are all involved in legal battles over copyright and have stopped publishing what they use. Many (except for Meta) have kept it a (trade) secret from the beginning and never shared such information. It's unscientific because it doesn't allow for repeatability. But AI is expensive and everyone is currently trying to get obscenely rich with it or strives for world domination.

But datasets are available, like the RedPajama one, several other collections for various purposes... Lots of datasets for fine-tuning and a whole community around that. Just for the base/foundation models, we don't have access to a current state of the art dataset for that.