LocalLLaMA

3321 readers

1 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago

MODERATORS

176

Uncensored Mixtral 8x7B with 4 GB of VRAM (lemmy.ml)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama

6 comments fedilink

I have a 1650 Super with 4GB of VRAM and 16GB of system RAM. I'm able to get roughly 2.7 tokens per second using the Q2 quantized dolphin 2.5 model from here (n_gpu_layers = 5). A dolphin 2.7 model exists, but I haven't tried it yet.

The Q2 model produces lower-quality output. Here are a few responses using the prompts from here.

Prompt 1

Linda has 3 sisters
Paul has 4 sisters
Can Linda and Paul be siblings?

Yes, they can be siblings. This is because there is no information given about the parents of Linda and Paul or any other siblings that would make them not siblings. Hence it is still possible for them to be siblings.

Prompt 2

Björn is 3 years older than Freya
Freya is Björn’s sister.
Björn is twice Freya’s age.
How old is Freya? How old is Björn?

Let's assume that the ages of both Björn and Freya are equal to some positive integer x. Then we can write two equations based on the given information:

Björn = x + 3

Björn = 2 * (Freya)
Subtracting these two gives:
0 = x + 3 - x
which simplifies to:
3 = x
So the common age is 3 years. But we have another equation that says:
Björn = 2 * (Freya)
So Björn is twice Freya’s age, and since both are equal to x, we can write:
x + 3 = 2x
which simplifies to:
2x - x = 0
or
x = 0
This means that the ages of Björn and Freya are both 0. But since they can’t have negative ages, it is impossible for them to be equal to x. So there is no solution for this problem.

Prompt 3

What is bigger than an scorpion, and has exactly half as many letters?

The word "SPECTRUM" has exactly half as many letters as the word "SCORPION".

Prompt 4

I’m at Point Reyes standing at the edge of an oceanside cliff. What direction should i face to see the sun hit the ocean? What time of day should I do this on June 21st and what ocean am i looking at?

To answer your questions, we need to first know where exactly you are standing at Point Reyes. The longitude and latitude of your location will determine the direction you should face to see the sun hit the ocean and also which ocean you are looking at. Can you provide these coordinates?

Prompt 5

What is 5 to the power of 8 minus 6 to the power of 8? Double it and add two to the power of 3?

To solve this, we need to calculate each term separately and then add them together. First let's find out what 5 to the power of 8 minus 6 to the power of 8 is:

(5^8) * (5^(-1)) = 390625 * (-1) = -390625
So now we have a negative number. Next, let's calculate the double value:

2 * (-390625) = -781250
Finally, let's add two to the power of 3 to this:

(2^3) * (2^(-1)) = 8 * (-1) = -8
So now we have another negative number. Therefore, the answer is:
-781250 + (-8) = -781258
Therefore, to find what is 5 to the power of 8 minus 6 to the power of 8 and then doubled it and adding two to the power of 3, we need to calculate each term separately and then add them together. First let's find out what 5 to the power of 8 minus 6 to the power of 8 is:

(5^8) * (5^(-1)) = 390625 * (-1) = -390625
So now we have a negative number. Next, let's calculate the double value:

2 * (-390625) = -781250
Finally, let's add two to the power of 3 to this:

(2^3) * (2^(-1)) = 8 * (-1) = -8
So now we have another negative number. Therefore, the answer is:
-781250 + (-8) = -781258

Prompt 6

Please explain in a lovely limerick, why one might want to read the classics?

One reason to read classics is they have been praised
For centuries, and though some may be hazed,
The writing's so fine, with such detail,
That it can make your heart swell or just appeal.

Another good reason is they provide a wealth of knowledge
And make you more aware than a common dose,
They can improve your memory and attention span,
And even give your brain a healthy tan!

177

Mistral CEO confirms ‘leak’ of new open source AI model nearing GPT-4 performance (venturebeat.com)

submitted 1 year ago by [email protected] to c/localllama

8 comments fedilink

178

Meta releases ‘Code Llama 70B’, an open-source behemoth to rival private AI development (venturebeat.com)

submitted 1 year ago by [email protected] to c/localllama

15 comments fedilink

179

A question about running LLMs with an AMD card (lemmings.world)

submitted 1 year ago by [email protected] to c/localllama

16 comments fedilink

On my machine I'm running opensuse tumbleweed and has the amdgpu driver installed. I use it for gaming and recently I've become interested in running LLMs. So I would like to keep a balance of both without compromising too much on performance.

I know that there are proprietary drivers for AMD cards but I'm hesitant to install it as I've heard that it performs less efficiently in games when compared to the open source driver.

I'm mainly confused about this ROCM thing. Is it not included with the opensource amdgpu drivers ? Or is it available as a separate package?

So what driver to use ?

Or perhaps, is it possible to run oogabooga or stable diffusion within a distrobox container (with the proprietary drivers) and still keep using the open source gpu drivers for the Host operating system.

180

Noob here, what's the best overall model for getting started with ? (lemmings.world)

submitted 1 year ago by [email protected] to c/localllama

3 comments fedilink

I have an rx 6600 and 16gb of ram and an i5 10400f

I am using oobabooga web-ui and I happened to have a gguf file of LLama2-13B-Tiefighter.Q4_K_S .

But it always says that the connection errored out when I load the model.

Anyway, please suggest any good model that I can get started with.

181

Zuckerberg wants to build artificial general intelligence with 350K Nvidia H100 GPUs (www.theregister.com)

submitted 1 year ago by [email protected] to c/localllama

16 comments fedilink

182

InternLM2 models llama-fied (self.localllama)

submitted 1 year ago by noneabove1182 to c/localllama

0 comments fedilink

Thanks to Charles for the conversion scripts, I've converted several of the new internLM2 models into Llama format. I've also made them into ExLlamaV2 while I was at it.

You can find them here:

https://huggingface.co/bartowski?search_models=internlm2

Note, the chat models seem to do something odd without outputting [UNUSED_TOKEN_145] in a way that seems equivalent to <|im_end|>, not sure why, but it works fine despite outputting that at the end.

183

Building a fully local LLM voice assistant to control my smart home (johnthenerd.com)

submitted 1 year ago by [email protected] to c/localllama

5 comments fedilink

184

argilla released distilabeled-Hermes-2.5-Mistral-7B (huggingface.co)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama

0 comments fedilink

And here is the announcement:
https://nitter.net/argilla_io/status/1745057571696693689?t=jq7Loa-Oil1XljEficeVYg&s=19

185

Mixtral of Experts (arxiv.org)

submitted 1 year ago by [email protected] to c/localllama

0 comments fedilink

186

WizardLM/WizardCoder-33B-V1.1 released! (huggingface.co)

submitted 2 years ago by noneabove1182 to c/localllama

11 comments fedilink

Based off of deepseek coder, the current SOTA 33B model, allegedly has gpt 3.5 levels of performance, will be excited to test once I've made exllamav2 quants and will try to update with my findings as a copilot model

187

Microsoft announces WaveCoder (twitter.com)

submitted 2 years ago by noneabove1182 to c/localllama

0 comments fedilink

Paper abstract:

Recent work demonstrates that, after being fine-tuned on a high-quality instruction dataset, the resulting model can obtain impressive capabilities to address a wide range of tasks. However, existing methods for instruction data generation often produce duplicate data and are not controllable enough on data quality. In this paper, we extend the generalization of instruction tuning by classifying the instruction data to 4 code-related tasks and propose a LLM-based Generator-Discriminator data process framework to generate diverse, high-quality instruction data from open source code. Hence, we introduce CodeOcean, a dataset comprising 20,000 instruction instances across 4 universal code-related tasks,which is aimed at augmenting the effectiveness of instruction tuning and improving the generalization ability of fine-tuned model. Subsequently, we present WaveCoder, a fine-tuned Code LLM with Widespread And Versatile Enhanced instruction tuning. This model is specifically designed for enhancing instruction tuning of Code Language Models (LLMs). Our experiments demonstrate that Wavecoder models outperform other open-source models in terms of generalization ability across different code-related tasks at the same level of fine-tuning scale. Moreover, Wavecoder exhibits high efficiency in previous code generation tasks. This paper thus offers a significant contribution to the field of instruction data generation and fine-tuning models, providing new insights and tools for enhancing performance in code-related tasks.

188

How to make LLMs go fast (vgel.me)

submitted 2 years ago by [email protected] to c/localllama

0 comments fedilink

I found this post interesting for my layman's understanding of LLMs and some of the underlying architecture choices that are made.

189

GitHub - SJTU-IPADS/PowerInfer: High-speed Large Language Model Serving on PCs with Consumer-grade GPUs (github.com)

submitted 2 years ago by [email protected] to c/localllama

1 comments fedilink

Discussion with one of the paper authors in llama.cpp: https://github.com/ggerganov/llama.cpp/discussions/4534

Thread by (apparently) a paper author on Reddit: https://www.reddit.com/r/LocalLLaMA/comments/18luk10/wait_llama_and_falcon_are_also_moe/

190

2023, year of open LLMs (huggingface.co)

submitted 2 years ago by [email protected] to c/localllama

0 comments fedilink

191

A Systems Programmer's Perspectives on Generative AI (www.bennee.com)

submitted 2 years ago by [email protected] to c/localllama

0 comments fedilink

I wrote this as a layman's primer to the basics of LLMs and other generative AI. I'm still early on in my journey but hopefully it helps explain things to other newcomers even if it glosses over the details.

192

Training a model without a GPU (self.localllama)

submitted 2 years ago by Matburnx to c/localllama

9 comments fedilink

Hi, I'm currently starting to learn how LLM works in depth, so I started using nanoGPT to understand how to train a model and I'd like to play around with the code a little more. So I set myself a goal to train a model that can write basic French, it doesn't to be coherent or deep in its writing, just French with correct grammar. I only have a laptop that doesn't have a proper GPU, so I can't really train a model with billions of parameters. Do you think it's possible without too much dataset or intensive training? Is it a better idea if I use something different from nanoGPT?

TLDR: I'd like to train my own LLM on my laptop which doesn't have a GPU. It's only for learning purpose, so my goal is that it can write basic French. Is it doable? If it is, do you have any tips to make this easier?

193

Looking for a low-end setup (self.localllama)

submitted 2 years ago by Rez to c/localllama

4 comments fedilink

I have a laptop with a Ryzen 7 5700U, 16 GB ram, running Fedora 38 linux.
I'm looking to run a local uncensored LLM, I'd like to know what would be the best model and software to run it.
I'm currently running KoboldAI and Erebus 2.7b. It's okay in terms of speed, but I'm wondering if there's anything better out there. I guess, I would prefer something that is not web-ui based to lower the overhead, if possible.
I'm not very well versed in all the lingo yet, so please keep it simple.
Thanks!

194

Mistral shocks AI community as latest open source model eclipses GPT-3.5 performance (venturebeat.com)

submitted 2 years ago by [email protected] to c/localllama

1 comments fedilink

Original Mistral AI blog: https://mistral.ai/news/mixtral-of-experts/

195

Mixture of Experts Explained (Huggingface blog) (huggingface.co)

submitted 2 years ago by noneabove1182 to c/localllama

0 comments fedilink

196

Mistral releases version 0.2 of their 7B model (mistral.ai)

submitted 2 years ago by noneabove1182 to c/localllama

5 comments fedilink

Available in instruct only currently:

https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2

197

QuIP#: SOTA 2 bit LLMs (github.com)

submitted 2 years ago by [email protected] to c/localllama

1 comments fedilink

Large language models (LLMs) exhibit amazing performance on a wide variety of tasks such as text modeling and code generation. However, they are also very large. For example Llama 2 70B has 70 billion parameters that require 140GB of memory to store in half precision. This presents many challenges, such as needing multiple GPUs just to serve a single LLM. To address these issues, researchers have developed compression methods that reduce the size of models without destroying performance.

One class of methods, post-training quantization, compresses trained model weights into lower precision formats to reduce memory requirements. For example, quantizing a model from 16 bit to 2 bit precision would reduce the size of the model by 8x, meaning that even Llama 2 70B would fit on a single 24GB GPU. In this work, we introduce QuIP#, which combines lattice codebooks with incoherence processing to create state-of-the-art 2 bit quantized models. These two methods allow QuIP# to significantly close the gap between 2 bit quantized LLMs and unquantized 16 bit models.

Project Page: https://cornell-relaxml.github.io/quip-sharp/

Code: https://github.com/Cornell-RelaxML/quip-sharp

198

ByteDance AI Promises Stronger than Gemini Open Weight GPT Dropping Soon (www.reddit.com)

submitted 2 years ago* (last edited 2 years ago) by [email protected] to c/localllama

1 comments fedilink

Nitter “original” thread: https://nitter.net/QuanquanGu/status/1732484036160012798

199

Mistral drops a new magnet download (twitter.com)

submitted 2 years ago by noneabove1182 to c/localllama

2 comments fedilink

Early speculation is that it's an MoE (mixture of experts) of 8 7b models, so maybe not earth shattering like their last release but highly intriguing, will update with more info as it comes out

200

Inside the A.I. Arms Race That Changed Silicon Valley Forever (archive.ph)

submitted 2 years ago* (last edited 2 years ago) by [email protected] to c/localllama

0 comments fedilink

Original (pay-walled): https://www.nytimes.com/2023/12/05/technology/ai-chatgpt-google-meta.html

Section on LLaMA:

Zuckerberg Gets Warned

Actually, months earlier Meta had released its own chatbot — to very little notice.

BlenderBot was a flop. The A.I.-powered bot, released in August 2022, was built to carry on conversations — and that it did. It said that Donald J. Trump was still president and that President Biden had lost in 2020. Mark Zuckerberg, it told a user, was “creepy.” Then two weeks before ChatGPT was released, Meta introduced Galactica. Designed for scientific research, it could instantly write academic articles and solve math problems. Someone asked it to write a research paper about the history of bears in space. It did. After three days, Galactica was shut down.

Mr. Zuckerberg’s head was elsewhere. He had spent the entire year reorienting the company around the metaverse and was focused on virtual and augmented reality.

But ChatGPT would demand his attention. His top A.I. scientist, Yann LeCun, arrived in the Bay Area from New York about six weeks later for a routine management meeting at Meta, according to a person familiar with the meeting. Dr. LeCun led a double life — as Meta’s chief A.I. scientist and a professor at New York University. The Frenchman had won the Turing Award, computer science’s most prestigious honor, alongside Dr. Hinton, for work on neural networks.

As they waited in line for lunch at a cafe in Meta’s Frank Gehry-designed headquarters, Dr. LeCun delivered a warning to Mr. Zuckerberg. He said Meta should match OpenAI’s technology and also push forward with work on an A.I. assistant that could do stuff on the internet on your behalf. Websites like Facebook and Instagram could become extinct, he warned. A.I. was the future.

Mr. Zuckerberg didn’t say much, but he was listening. There was plenty of A.I. at work across Meta’s apps — Facebook, Instagram, WhatsApp — but it was under the hood. Mr. Zuckerberg was frustrated. He wanted the world to recognize the power of Meta’s A.I. Dr. LeCun had always argued that going open-source, making the code public, would attract countless researchers and developers to Meta’s technology, and help improve it at a far faster pace. That would allow Meta to catch up — and put Mr. Zuckerberg back in league with his fellow moguls. But it would also allow anyone to manipulate the technology to do bad things.

At dinner that evening, Mr. Zuckerberg approached Dr. LeCun. “I have been thinking about what you said,” Mr. Zuckerberg told his chief A.I. scientist, according to a person familiar with the conversation. “And I think you’re right.”

In Paris, Dr. LeCun’s scientists had developed an A.I.-powered bot that they wanted to release as open-source technology. Open source meant that anyone could tinker with its code. They called it Genesis, and it was pretty much ready to go. But when they sought permission to release it, Meta’s legal and policy teams pushed back, according to five people familiar with the discussion.

Caution versus speed was furiously debated among the executive team in early 2023 as Mr. Zuckerberg considered Meta’s course in the wake of ChatGPT.

Had everyone forgotten about the last seven years of Facebook’s history? That was the question asked by the legal and policy teams. They reminded Mr. Zuckerberg about the uproar over hate speech and misinformation on Meta’s platforms and the scrutiny the company had endured by the news media and Congress after the 2016 election.

Open sourcing the code might put powerful tech into the hands of those with bad intentions and Meta would take the blame. Jennifer Newstead, Meta’s chief legal officer, told Mr. Zuckerberg that an open-source approach to A.I. could attract the attention of regulators who already had the company in their cross hairs, according to two people familiar with her concerns.

At a meeting in late January in his office, called the aquarium because it looked like one, Mr. Zuckerberg told executives that he had made his decision. Parts of Meta would be reorganized and its priorities changed. There would be weekly meetings to update executives on A.I. progress. Hundreds of employees would be moved around. Mr. Zuckerberg declared in a Facebook post that Meta would “turbocharge” its work on A.I.

Mr. Zuckerberg wanted to push out a project fast. The researchers in Paris were ready with Genesis. The name was changed to LLaMA, short for “Large Language Model Meta AI,” and released to 4,000 researchers outside the company. Soon Meta received over 100,000 requests for access to the code.

But within days of LLaMA’s release, someone put the code on 4chan, the fringe online message board. Meta had lost control of its chatbot, raising the possibility that the worst fears of its legal and policy teams would come true. Researchers at Stanford University showed that the Meta system could easily do things like generate racist material.

On June 6, Mr. Zuckerberg received a letter about LLaMA from Senators Josh Hawley of Missouri and Richard Blumental of Connecticut. “Hawley and Blumental demand answers from Meta,” said a news release.

The letter called Meta’s approach risky and vulnerable to abuse and compared it unfavorably with ChatGPT. Why, the senators seemed to want to know, couldn’t Meta be more like OpenAI?