LocalLLaMA

2265 readers
7 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 1 year ago
MODERATORS
76
77
 
 

I've been using TheBlokes Q8 of https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B, but now this one (https://huggingface.co/Weyaxi/OpenHermes-2.5-neural-chat-7b-v3-1-7B) I think is killing it. Has anyone else tested it?

78
20
submitted 1 year ago* (last edited 1 year ago) by noneabove1182 to c/localllama
79
80
 
 

LMSYS examines how improper data decontamination can lead to artificially inflated scores

81
 
 

H200 is up to 1.9x faster than H100. This performance is enabled by H200's larger, faster HBM3e memory.

https://nvidianews.nvidia.com/news/nvidia-supercharges-hopper-the-worlds-leading-ai-computing-platform

82
12
... (programming.dev)
submitted 1 year ago* (last edited 9 months ago) by [email protected] to c/localllama
 
 

CogVLM: Visual Expert for Pretrained Language Models

Presents CogVLM, a powerful open-source visual language foundation model that achieves SotA perf on 10 classic cross-modal benchmarks

repo: https://github.com/THUDM/CogVLM abs: https://arxiv.org/abs/2311.03079

83
9
submitted 1 year ago* (last edited 1 year ago) by noneabove1182 to c/localllama
 
 

The creator of ExLlamaV2 (turboderp) has released a lightweight web UI for running exllamav2, it's quite nice! Missing some stuff from text-generation-webui, but makes up for it by being very streamlined and clean

I've made a docker image for it for anyone who may want to try it out, GitHub repo here:

https://github.com/noneabove1182/exui-docker

And for finding models to run with exllamav2 I've been uploading several here:

https://huggingface.co/bartowski

Enjoy!

84
14
... (programming.dev)
submitted 1 year ago* (last edited 9 months ago) by [email protected] to c/localllama
 
 

article: https://x.ai

trained a prototype LLM (Grok-0) with 33 billion parameters. This early model approaches LLaMA 2 (70B) capabilities on standard LM benchmarks but uses only half of its training resources. In the last two months, we have made significant improvements in reasoning and coding capabilities leading up to Grok-1, a state-of-the-art language model that is significantly more powerful, achieving 63.2% on the HumanEval coding task and 73% on MMLU.

85
21
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama
 
 

"This feature utilizes KV cache shifting to automatically remove old tokens from context and add new ones without requiring any reprocessing."

This means a major speed increase for people like me who rely on (slow) CPU inference (or big models). Consider a chatbot scenario and a long chat where old lines of dialogue need to be evicted from the context to stay within the (4096 token) context size. Previously the context had to be re-computed starting with the first changed/now missing token. This feature detects that, deletes the affected tokens from the KV cache and shifts the subsequent tokens in the KV cache so it can be re-used. Avoiding a computationally expensive re-calculation.

This is probably also more or less related to recent advancements like Streaming-LLM

This won't help once text gets inserted "in the middle" or the prompt gets changed in another way. But I managed to connect KoboldCPP as a backend for SillyTavern/Oobabooga and now I'm able to have unlimited length conversations without waiting excessively, once the chat history hits max tokens and the frontend starts dropping text.

It's just a clever way to re-use the KV cache in one specific case. But I've wished for this for quite some time.

86
 
 

A python Gradio web UI built using GPT4AllEmbeddings

87
88
46
why reddit? (lemmy.world)
submitted 1 year ago by [email protected] to c/localllama
 
 

I don't get why a group of users that are willing to run their own LLMs locally and do not want to relay on centralized corporations like openAI or google prefer to discuss using a centralized site like Reddit

89
 
 

Phind is now using a V7 of their model for their own platform, as they have found that people overall prefer that output vs GPT4. This is extremely impressive because it's not just a random benchmark that can be gamed, but instead crowd sourced opinion on real tasks

The one place everything still lags behind GPT4 is question comprehension, but this is a huge accomplishment

Blog post: https://www.phind.com/blog/phind-model-beats-gpt4-fast

note: they've only open released V2 of their model, hopefully they release newer versions soon.. would love to play with them outside their sandbox

90
 
 

Very interesting new sampler, does a better drop at filtering out extremely unlikely tokens when the most likely tokens are less confident, from the results it seems to pretty reliably improve quality with no noticeable downside

91
 
 

30T tokens, 20.5T in English, allegedly high quality, can't wait to see people start putting it to use!

Related github: https://github.com/togethercomputer/RedPajama-Data

92
 
 

Finally got a nice script going that automates most of the process. Uploads will all be same format, with each bit per weight going into its own branch.

the first two I did don't have great READMEs but the rest will look like this one: https://huggingface.co/bartowski/Mistral-7B-claude-chat-exl2

Also taking recommendations on anything you want to see included in readme or quant levels

93
 
 

They are referencing this paper: LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset from September 30.

The paper itself provides some insight on how people use LLMs and the distribution of the different use-cases.

The researchers had a look at conversations with 25 LLMs. Data is collected from 210K unique IP addresses in the wild on their Vicuna demo and Chatbot Arena website.

94
 
 

For anyone who happens to be using my docker images or Dockerfiles for their text-gen-webui, it all started breaking this week when Oobabooga's work was updated to support 12.1

As such, I have updated my docker images and fixed a bunch of issues in the build process. Also been awhile since I posted it here.

You can find all the details here:

https://github.com/noneabove1182/text-generation-webui-docker

It requires driver version 535.113.01

Happy LLMing!

95
 
 

From the tweet (minus pictures):

Language models are bad a basic math.

GPT-4 has right around 0% accuracy rate on 5 digit multiplication.

Most open models can't even add. Why is that?

There are a few reasons why numbers are hard. The main one is Tokenization. When training a tokenizer from scratch, you take a large corpus of text and find the minimal byte-pair encoding for a chosen vocabulary size.

This means, however, that numbers will almost certainly not have unique token representations. "21" could be a single token, or ["2", "1"]. 143 could be ["143"] or ["14", "3"] or any other combination.

A potential fix here would be to force single digit tokenization. The state of the art for the last few years is to inject a space between every digit when creating the tokenizer and when running the model. This means 143 would always be tokenized as ["1", "4", "3"].

This helps boost performance, but wastes tokens while not fully fixing the problem.

A cool fix might be xVal! This work by The Polymathic AI Collaboration suggests a generic [NUM] token which is then scaled by the actual value of the number!

If you look at the red lines in the image above, you can get an intuition for how that might work.

It doesn't capture a huge range or high fidelity (e.g., 7.4449 vs 7.4448) but they showcase some pretty convincing results on sequence prediction problems that are primarily numeric.

For example, they want to train a sequence model on GPS conditioned temperature forecasting

They found a ~70x improvement over standard vanilla baselines and a 2x improvement over really strong baselines.

One cool side effect is that deep neural networks might be really good at regression problems using this encoding scheme!

96
7
Musical notation (lemmy.dbzer0.com)
submitted 1 year ago by [email protected] to c/localllama
 
 

Would adding musical notations to LLMs training data allow it to create music since notations are a lot like normal languages? Or does it do so already?

97
98
15
submitted 1 year ago* (last edited 1 year ago) by noneabove1182 to c/localllama
 
 

Model is trained on his own orca style dataset as well as some airoboros apparently to increase creativity

Quants:

https://huggingface.co/TheBloke/dolphin-2.0-mistral-7B-GPTQ

https://huggingface.co/TheBloke/dolphin-2.0-mistral-7B-GGUF

https://huggingface.co/TheBloke/dolphin-2.0-mistral-7B-AWQ

99
26
Mistral 7B model (mistral.ai)
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama
 
 

Yesterday Mistral AI released a new language model called Mistral 7B. @[email protected] already posted the Sliding attention part here in LocalLLaMA, yesterday. But I think the model and the company behind that are even more noteworthy and the release of the model is worth it's own post.

Mistral 7B is not based on Llama. And they claim it outperforms Llama2 13B on all benchmarks (at it's size of 7B). It has additional coding abilities and a 8k sequence length. And it's released under the Apache 2.0 license. ~~So truly an 'open' model, usable without restrictions.~~ [Edit: Unfortunately I couldn't find the dataset or a paper. They call it 'open-weight'. So my conclusion regarding the open-ness might be a bit premature. We'll see.]

(It uses Grouped-query attention and Sliding Window Attention.)

Also worth to note: Mistral AI (the company) is based in Paris. They are one of the few big european AI startups and collected $113 million funding in June.

I've tried it and it indeed looks promising. It certainly has features that distinguishes it from Llama. And I like the competition. Our world is currently completely dominated by Meta. And if it performs exceptionally well at its size, I hope people pick up on it and fine-tune it for all kinds of specific tasks. (The lack of a dataset and detail regarding the training could be a downside, though. These were not included in this initial release of the model.)


EDIT 2023-10-12: Paper released at: https://arxiv.org/abs/2310.06825 (But I'd say no new information in it, they mostly copied their announcement)

As of now, it is clear they don't want to publish any details about the training.

100
 
 

AutoGen is a framework that enables development of LLM applications using multiple agents that can converse with each other to solve task. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools.

Git repo here: https://github.com/microsoft/autogen

view more: ‹ prev next ›