LocalLLaMA

2394 readers
1 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago
MODERATORS
201
202
 
 

I realized that while Microsoft would probably release their LLaMA-13b based model (as of the time of this writing they still haven't) I concluded that they might not release the dataset. Therefore, I resolved to replicate their efforts, download the data myself, and train the model myself, so that OpenOrca can be released on other sizes of LLaMA as well as other foundational models such as Falcon, OpenLLaMA, RedPajama, MPT, RWKV.

203
 
 

Koboldcpp 1.33 was released, and with it means new docker images :) anything with -gpu works with cublas now!

Released my updates for koboldcpp docker images for v1.33 (CUDA support!):

https://hub.docker.com/u/noneabove1182

there's also a new koboldcpp-gpu-test where i'm trying to reduce the image size, got it down to less than half of the original -gpu (1.58GB vs 3.87GB), everything seems to be working but if anyone else is willing to help validate that would be much appreciated

make sure if you're upgrading you clear out your docker volume, it does weird things during upgrades...

204
7
submitted 2 years ago by Matburnx to c/localllama
 
 

This might seem like a dumb question, but my disk space is currently pretty low and I'd like to clean some of my files.

A lot of space has been taken by the models I downloaded with different projects like the one from oobabooga or LocalGPT, however I can't find the file where they were downloaded. So I'd like to know if anyone knows where it is.

I'm on Windows if it changes anything. Thanks in advance for your answers

205
 
 

Long Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length -- https://blog.salesforceairesearch.com/xgen/

206
207
 
 

The models after pruning can be used as is. Other methods require computationally expensive retraining or a weight update process.

Paper: https://arxiv.org/abs/2306.11695

Code: https://github.com/locuslab/wanda

Excerpts: The argument concerning the need for retraining and weight update does not fully capture the challenges of pruning LLMs. In this work, we address this challenge by introducing a straightforward and effective approach, termed Wanda (Pruning by Weights and activations). This technique successfully prunes LLMs to high degrees of sparsity without any need for modifying the remaining weights. Given a pretrained LLM, we compute our pruning metric from the initial to the final layers of the network. After pruning a preceding layer, the subsequent layer receives updated input activations, based on which its pruning metric will be computed. The sparse LLM after pruning is ready to use without further training or weight adjustment. We evaluate Wanda on the LLaMA model family, a series of Transformer language models at various parameter levels, often referred to as LLaMA-7B/13B/30B/65B. Without any weight update, Wanda outperforms the established pruning approach of magnitude pruning by a large margin. Our method also performs on par with or in most cases better than the prior reconstruction-based method SparseGPT. Note that as the model gets larger in size, the accuracy drop compared to the original dense model keeps getting smaller. For task-wise performance, we observe that there are certain tasks where our approach Wanda gives consistently better results across all LLaMA models, i.e. HellaSwag, ARC-c and OpenbookQA. We explore using parameter efficient fine-tuning (PEFT) techniques to recover performance of pruned LLM models. We use a popular PEFT method LoRA, which has been widely adopted for task specific fine-tuning of LLMs. However, here we are interested in recovering the performance loss of LLMs during pruning, thus we perform a more general “fine-tuning” where the pruned networks are trained with an autoregressive objective on C4 dataset. We enforce a limited computational budget (1 GPU and 5 hours). We find that we are able to restore performance of pruned LLaMA-7B (unstructured 50% sparsity) with a non-trivial amount, reducing zero-shot WikiText perplexity from 7.26 to 6.87. The additional parameters introduced by LoRA is only 0.06%, leaving the total sparsity level still at around 50% level. ​

NOTE: This text was largely copied from u/llamaShill

208
209
210
 
 

Took me some time to figure this one out, and unfortunately requires a significantly larger image (need so much more of nvidia's toolkit D: couldn't figure out a way to get around it..)

If people prefer a smaller image, I can start maintaining one for exllama and one without, but for now 1.0 is identical minus exllama support (and I guess also from an older commit) so you can use that one until there's actual new functionality :)

211
9
submitted 2 years ago* (last edited 2 years ago) by noneabove1182 to c/localllama
 
 

New models posted by TheBloke, 7B to 65B, something for everyone!

Info from creators:

A stunning arrival! The fully upgraded Robin Series V2 language model is ready and eagerly awaiting your exploration.

This is not just a model upgrade, but the crystallization of wisdom from our research and development team. In the new version, Robin Series V2 has performed excellently among various open-source models, defeating well-known models such as Falcon, LLaMA, StableLM, RedPajama, MPT.

Specifically, we have carried out in-depth fine-tuning based on the entire LLaMA series, including 7b, 13b, 33b, 65b, all of which have achieved pleasing results. Robin-7b scored 51.7 in the OpenLLM standard test, and Robin-13b even reached as high as 59.1, ranking sixth, surpassing many 33b models. The achievements of Robin-33b and Robin-65b are even more surprising, with scores of 64.1 and 65.2 respectively, firmly securing the top positions.

212
12
Any way to prune LLMs? (self.localllama)
submitted 2 years ago by django to c/localllama
 
 

Hey, I'm working on some local LLM applications and my goal is to run the smallest model possible without crippling performance. I'm already using 4 bit GPTQ but I want something smaller. These models have been trained on such a massive amount of data but my specific use case only touches a very very small fraction of that, so I would imagine it's possible to cut away large chunks of the model that I don't care about. I'm wondering if there has been any work on runtime pruning of LLMs (not just static pruning based on model weights) based on "real world" data. Something like: you run the model a bunch of times with your actual data and monitor the neuron activations to inform some kind of pruning process. Does anyone here know about something like that?

213
 
 

Gorilla is an LLM that can learn to use APIs, and if like to try getting it to use some that I work with.

There's a GGML here and the original repo is here. They have instructions for adding an API, but I don't really understand them, at least not well enough to add a generic one.

It looks really good though, which is why I'm excited about it! I think it should be possible to use generic APIs like this, if I understand it correctly.

214
 
 

Main link is to GPU image, CPU image can be found here:

https://hub.docker.com/r/noneabove1182/text-gen-ui-cpu

The CPU one is built for exclusively running with a CPU. The GPU one is compiled with CUDA support and gets blazing fast ingestion and generation.

Included in each readme is a disclaimer that I am once again not affiliated, and I include an example working docker-compose.yml, make sure you change the args to fit your own personal setup! :)

Feel free to ask any questions or let me know if anything doesn't work! Hacked it together by the skin of my teeth, and put a LOT of effort into reducing image size for the GPU one (16GB down to 9GB, still massive..) so please do post if you have any issues!

215
216
 
 

Maintaining an image for myself to containerize koboldcpp so figured might as well share it with others :) updated to 1.30.3

217
218
 
 

Promising stuff from their repo, claiming "exceptional performance, achieving a [HumanEval] pass@1 score of 57.3, surpassing the open-source SOTA by approximately 20 points."

https://github.com/nlpxucan/WizardLM

219
 
 

User NeverEndingToast over on Reddit posted that they're making a wiki to compile local LLM knowledge, feel free to help contribute! I'll make another post when I see the wiki URL get posted

220
 
 

I found and bookmarked this resource a while back. Lots of tools to try out!

221
14
submitted 2 years ago by swandi to c/localllama
 
 

I've been really into monitoring new projects on GitHub that interact with LLMs. Interested to see if you all have any repos that you're interested in, or using, that you would like to share. I haven't tried a few of these yet, but I like to keep an eye on them for updates.

ChatDocs

"Chat with your documents offline using AI. No data leaves your system. Internet connection is only required to install the tool and download the AI models. It is based on PrivateGPT but has more features."

ChatArena

"ChatArena is a library that provides multi-agent language game environments and facilitates research about autonomous LLM agents and their social interactions."

WAFL

"WAFL is a framework for home assistants. It is designed to combine Large Language Models and rules to create a predictable behavior. Specifically, instead of organising the work of an LLM into a chain of thoughts, WAFL intends to organise its behavior into inference trees."

Gorilla

"Gorilla enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke. With Gorilla, we are the first to demonstrate how to use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. We also release APIBench, the largest collection of APIs, curated and easy to be trained on! Join us, as we try to expand the largest API store and teach LLMs how to write them!"

EdgeGPT

"Extension for Text Generation Webui based on EdgeGPT by acheong08, a reverse engineered API of Microsoft's Bing Chat AI. Now you can give a sort of Internet access to your characters, easily, quickly and free."

222
16
Roleplay LLMs? (lemmy.fmhy.ml)
submitted 2 years ago by [email protected] to c/localllama
 
 

Hey all, which LLMs are good for roleplay? Is base Llama good? I've read Pygmalion is meant to be tweaked for it but haven't tried it yet.

Ideally I'm hoping for a model that can stick in character

223
 
 

Let's talk about our experiences working with different models, either known or lesser-known.

Which locally run language models have you tried out? Share your insights, challenges, or anything you found interesting during your encounters with those models.

224
 
 

Hey all, I'm trying to get a discord bot working that connects to my LLM - as in, it spits out the LLM's responses and sends it the messages from Discord. Does anybody know of one that works/have a guide?

225
 
 

I figured I'd post this. It's a great way to get an LLM set up on your computer and is extremely easy for folks that don't have that much technical knowledge!

view more: ‹ prev next ›