LocalLLaMA

2265 readers
4 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 1 year ago
MODERATORS
51
52
53
 
 

On my machine I'm running opensuse tumbleweed and has the amdgpu driver installed. I use it for gaming and recently I've become interested in running LLMs. So I would like to keep a balance of both without compromising too much on performance.

I know that there are proprietary drivers for AMD cards but I'm hesitant to install it as I've heard that it performs less efficiently in games when compared to the open source driver.

I'm mainly confused about this ROCM thing. Is it not included with the opensource amdgpu drivers ? Or is it available as a separate package?

So what driver to use ?

Or perhaps, is it possible to run oogabooga or stable diffusion within a distrobox container (with the proprietary drivers) and still keep using the open source gpu drivers for the Host operating system.

54
 
 

I have an rx 6600 and 16gb of ram and an i5 10400f

I am using oobabooga web-ui and I happened to have a gguf file of LLama2-13B-Tiefighter.Q4_K_S .

But it always says that the connection errored out when I load the model.

Anyway, please suggest any good model that I can get started with.

55
56
6
submitted 10 months ago by noneabove1182 to c/localllama
 
 

Thanks to Charles for the conversion scripts, I've converted several of the new internLM2 models into Llama format. I've also made them into ExLlamaV2 while I was at it.

You can find them here:

https://huggingface.co/bartowski?search_models=internlm2

Note, the chat models seem to do something odd without outputting [UNUSED_TOKEN_145] in a way that seems equivalent to <|im_end|>, not sure why, but it works fine despite outputting that at the end.

57
58
16
submitted 10 months ago* (last edited 10 months ago) by [email protected] to c/localllama
59
60
 
 

Based off of deepseek coder, the current SOTA 33B model, allegedly has gpt 3.5 levels of performance, will be excited to test once I've made exllamav2 quants and will try to update with my findings as a copilot model

61
 
 

Paper abstract:

Recent work demonstrates that, after being fine-tuned on a high-quality instruction dataset, the resulting model can obtain impressive capabilities to address a wide range of tasks. However, existing methods for instruction data generation often produce duplicate data and are not controllable enough on data quality. In this paper, we extend the generalization of instruction tuning by classifying the instruction data to 4 code-related tasks and propose a LLM-based Generator-Discriminator data process framework to generate diverse, high-quality instruction data from open source code. Hence, we introduce CodeOcean, a dataset comprising 20,000 instruction instances across 4 universal code-related tasks,which is aimed at augmenting the effectiveness of instruction tuning and improving the generalization ability of fine-tuned model. Subsequently, we present WaveCoder, a fine-tuned Code LLM with Widespread And Versatile Enhanced instruction tuning. This model is specifically designed for enhancing instruction tuning of Code Language Models (LLMs). Our experiments demonstrate that Wavecoder models outperform other open-source models in terms of generalization ability across different code-related tasks at the same level of fine-tuning scale. Moreover, Wavecoder exhibits high efficiency in previous code generation tasks. This paper thus offers a significant contribution to the field of instruction data generation and fine-tuning models, providing new insights and tools for enhancing performance in code-related tasks.

62
 
 

I found this post interesting for my layman's understanding of LLMs and some of the underlying architecture choices that are made.

63
 
 

Discussion with one of the paper authors in llama.cpp: https://github.com/ggerganov/llama.cpp/discussions/4534

Thread by (apparently) a paper author on Reddit: https://www.reddit.com/r/LocalLLaMA/comments/18luk10/wait_llama_and_falcon_are_also_moe/

64
11
2023, year of open LLMs (huggingface.co)
submitted 11 months ago by [email protected] to c/localllama
65
 
 

I wrote this as a layman's primer to the basics of LLMs and other generative AI. I'm still early on in my journey but hopefully it helps explain things to other newcomers even if it glosses over the details.

66
14
submitted 11 months ago by Matburnx to c/localllama
 
 

Hi, I'm currently starting to learn how LLM works in depth, so I started using nanoGPT to understand how to train a model and I'd like to play around with the code a little more. So I set myself a goal to train a model that can write basic French, it doesn't to be coherent or deep in its writing, just French with correct grammar. I only have a laptop that doesn't have a proper GPU, so I can't really train a model with billions of parameters. Do you think it's possible without too much dataset or intensive training? Is it a better idea if I use something different from nanoGPT?

TLDR: I'd like to train my own LLM on my laptop which doesn't have a GPU. It's only for learning purpose, so my goal is that it can write basic French. Is it doable? If it is, do you have any tips to make this easier?

67
8
submitted 11 months ago by Rez to c/localllama
 
 

I have a laptop with a Ryzen 7 5700U, 16 GB ram, running Fedora 38 linux.
I'm looking to run a local uncensored LLM, I'd like to know what would be the best model and software to run it.
I'm currently running KoboldAI and Erebus 2.7b. It's okay in terms of speed, but I'm wondering if there's anything better out there. I guess, I would prefer something that is not web-ui based to lower the overhead, if possible.
I'm not very well versed in all the lingo yet, so please keep it simple.
Thanks!

68
 
 
69
70
 
 

Available in instruct only currently:

https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2

71
 
 

Large language models (LLMs) exhibit amazing performance on a wide variety of tasks such as text modeling and code generation. However, they are also very large. For example Llama 2 70B has 70 billion parameters that require 140GB of memory to store in half precision. This presents many challenges, such as needing multiple GPUs just to serve a single LLM. To address these issues, researchers have developed compression methods that reduce the size of models without destroying performance.

One class of methods, post-training quantization, compresses trained model weights into lower precision formats to reduce memory requirements. For example, quantizing a model from 16 bit to 2 bit precision would reduce the size of the model by 8x, meaning that even Llama 2 70B would fit on a single 24GB GPU. In this work, we introduce QuIP#, which combines lattice codebooks with incoherence processing to create state-of-the-art 2 bit quantized models. These two methods allow QuIP# to significantly close the gap between 2 bit quantized LLMs and unquantized 16 bit models.

Project Page: https://cornell-relaxml.github.io/quip-sharp/

Code: https://github.com/Cornell-RelaxML/quip-sharp

72
 
 
73
 
 

Early speculation is that it's an MoE (mixture of experts) of 8 7b models, so maybe not earth shattering like their last release but highly intriguing, will update with more info as it comes out

74
13
submitted 11 months ago* (last edited 11 months ago) by [email protected] to c/localllama
 
 

Original (pay-walled): https://www.nytimes.com/2023/12/05/technology/ai-chatgpt-google-meta.html

Section on LLaMA:

Zuckerberg Gets Warned

Actually, months earlier Meta had released its own chatbot — to very little notice.

BlenderBot was a flop. The A.I.-powered bot, released in August 2022, was built to carry on conversations — and that it did. It said that Donald J. Trump was still president and that President Biden had lost in 2020. Mark Zuckerberg, it told a user, was “creepy.” Then two weeks before ChatGPT was released, Meta introduced Galactica. Designed for scientific research, it could instantly write academic articles and solve math problems. Someone asked it to write a research paper about the history of bears in space. It did. After three days, Galactica was shut down.

Mr. Zuckerberg’s head was elsewhere. He had spent the entire year reorienting the company around the metaverse and was focused on virtual and augmented reality.

But ChatGPT would demand his attention. His top A.I. scientist, Yann LeCun, arrived in the Bay Area from New York about six weeks later for a routine management meeting at Meta, according to a person familiar with the meeting. Dr. LeCun led a double life — as Meta’s chief A.I. scientist and a professor at New York University. The Frenchman had won the Turing Award, computer science’s most prestigious honor, alongside Dr. Hinton, for work on neural networks.

As they waited in line for lunch at a cafe in Meta’s Frank Gehry-designed headquarters, Dr. LeCun delivered a warning to Mr. Zuckerberg. He said Meta should match OpenAI’s technology and also push forward with work on an A.I. assistant that could do stuff on the internet on your behalf. Websites like Facebook and Instagram could become extinct, he warned. A.I. was the future.

Mr. Zuckerberg didn’t say much, but he was listening. There was plenty of A.I. at work across Meta’s apps — Facebook, Instagram, WhatsApp — but it was under the hood. Mr. Zuckerberg was frustrated. He wanted the world to recognize the power of Meta’s A.I. Dr. LeCun had always argued that going open-source, making the code public, would attract countless researchers and developers to Meta’s technology, and help improve it at a far faster pace. That would allow Meta to catch up — and put Mr. Zuckerberg back in league with his fellow moguls. But it would also allow anyone to manipulate the technology to do bad things.

At dinner that evening, Mr. Zuckerberg approached Dr. LeCun. “I have been thinking about what you said,” Mr. Zuckerberg told his chief A.I. scientist, according to a person familiar with the conversation. “And I think you’re right.”

In Paris, Dr. LeCun’s scientists had developed an A.I.-powered bot that they wanted to release as open-source technology. Open source meant that anyone could tinker with its code. They called it Genesis, and it was pretty much ready to go. But when they sought permission to release it, Meta’s legal and policy teams pushed back, according to five people familiar with the discussion.

Caution versus speed was furiously debated among the executive team in early 2023 as Mr. Zuckerberg considered Meta’s course in the wake of ChatGPT.

Had everyone forgotten about the last seven years of Facebook’s history? That was the question asked by the legal and policy teams. They reminded Mr. Zuckerberg about the uproar over hate speech and misinformation on Meta’s platforms and the scrutiny the company had endured by the news media and Congress after the 2016 election.

Open sourcing the code might put powerful tech into the hands of those with bad intentions and Meta would take the blame. Jennifer Newstead, Meta’s chief legal officer, told Mr. Zuckerberg that an open-source approach to A.I. could attract the attention of regulators who already had the company in their cross hairs, according to two people familiar with her concerns.

At a meeting in late January in his office, called the aquarium because it looked like one, Mr. Zuckerberg told executives that he had made his decision. Parts of Meta would be reorganized and its priorities changed. There would be weekly meetings to update executives on A.I. progress. Hundreds of employees would be moved around. Mr. Zuckerberg declared in a Facebook post that Meta would “turbocharge” its work on A.I.

Mr. Zuckerberg wanted to push out a project fast. The researchers in Paris were ready with Genesis. The name was changed to LLaMA, short for “Large Language Model Meta AI,” and released to 4,000 researchers outside the company. Soon Meta received over 100,000 requests for access to the code.

But within days of LLaMA’s release, someone put the code on 4chan, the fringe online message board. Meta had lost control of its chatbot, raising the possibility that the worst fears of its legal and policy teams would come true. Researchers at Stanford University showed that the Meta system could easily do things like generate racist material.

On June 6, Mr. Zuckerberg received a letter about LLaMA from Senators Josh Hawley of Missouri and Richard Blumental of Connecticut. “Hawley and Blumental demand answers from Meta,” said a news release.

The letter called Meta’s approach risky and vulnerable to abuse and compared it unfavorably with ChatGPT. Why, the senators seemed to want to know, couldn’t Meta be more like OpenAI?

75
58
submitted 11 months ago* (last edited 11 months ago) by [email protected] to c/localllama
 
 

Seems like a really cool project. Lowering the barrier to entry of locally run models. As llamacpp supports a ton of models, I imagine it be easy to adapt this for other models other than the prebuilt ones.

view more: ‹ prev next ›