LocalLLaMA

2292 readers
1 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago
MODERATORS
76
 
 
77
 
 

Early speculation is that it's an MoE (mixture of experts) of 8 7b models, so maybe not earth shattering like their last release but highly intriguing, will update with more info as it comes out

78
 
 

Original (pay-walled): https://www.nytimes.com/2023/12/05/technology/ai-chatgpt-google-meta.html

Section on LLaMA:

Zuckerberg Gets Warned

Actually, months earlier Meta had released its own chatbot — to very little notice.

BlenderBot was a flop. The A.I.-powered bot, released in August 2022, was built to carry on conversations — and that it did. It said that Donald J. Trump was still president and that President Biden had lost in 2020. Mark Zuckerberg, it told a user, was “creepy.” Then two weeks before ChatGPT was released, Meta introduced Galactica. Designed for scientific research, it could instantly write academic articles and solve math problems. Someone asked it to write a research paper about the history of bears in space. It did. After three days, Galactica was shut down.

Mr. Zuckerberg’s head was elsewhere. He had spent the entire year reorienting the company around the metaverse and was focused on virtual and augmented reality.

But ChatGPT would demand his attention. His top A.I. scientist, Yann LeCun, arrived in the Bay Area from New York about six weeks later for a routine management meeting at Meta, according to a person familiar with the meeting. Dr. LeCun led a double life — as Meta’s chief A.I. scientist and a professor at New York University. The Frenchman had won the Turing Award, computer science’s most prestigious honor, alongside Dr. Hinton, for work on neural networks.

As they waited in line for lunch at a cafe in Meta’s Frank Gehry-designed headquarters, Dr. LeCun delivered a warning to Mr. Zuckerberg. He said Meta should match OpenAI’s technology and also push forward with work on an A.I. assistant that could do stuff on the internet on your behalf. Websites like Facebook and Instagram could become extinct, he warned. A.I. was the future.

Mr. Zuckerberg didn’t say much, but he was listening. There was plenty of A.I. at work across Meta’s apps — Facebook, Instagram, WhatsApp — but it was under the hood. Mr. Zuckerberg was frustrated. He wanted the world to recognize the power of Meta’s A.I. Dr. LeCun had always argued that going open-source, making the code public, would attract countless researchers and developers to Meta’s technology, and help improve it at a far faster pace. That would allow Meta to catch up — and put Mr. Zuckerberg back in league with his fellow moguls. But it would also allow anyone to manipulate the technology to do bad things.

At dinner that evening, Mr. Zuckerberg approached Dr. LeCun. “I have been thinking about what you said,” Mr. Zuckerberg told his chief A.I. scientist, according to a person familiar with the conversation. “And I think you’re right.”

In Paris, Dr. LeCun’s scientists had developed an A.I.-powered bot that they wanted to release as open-source technology. Open source meant that anyone could tinker with its code. They called it Genesis, and it was pretty much ready to go. But when they sought permission to release it, Meta’s legal and policy teams pushed back, according to five people familiar with the discussion.

Caution versus speed was furiously debated among the executive team in early 2023 as Mr. Zuckerberg considered Meta’s course in the wake of ChatGPT.

Had everyone forgotten about the last seven years of Facebook’s history? That was the question asked by the legal and policy teams. They reminded Mr. Zuckerberg about the uproar over hate speech and misinformation on Meta’s platforms and the scrutiny the company had endured by the news media and Congress after the 2016 election.

Open sourcing the code might put powerful tech into the hands of those with bad intentions and Meta would take the blame. Jennifer Newstead, Meta’s chief legal officer, told Mr. Zuckerberg that an open-source approach to A.I. could attract the attention of regulators who already had the company in their cross hairs, according to two people familiar with her concerns.

At a meeting in late January in his office, called the aquarium because it looked like one, Mr. Zuckerberg told executives that he had made his decision. Parts of Meta would be reorganized and its priorities changed. There would be weekly meetings to update executives on A.I. progress. Hundreds of employees would be moved around. Mr. Zuckerberg declared in a Facebook post that Meta would “turbocharge” its work on A.I.

Mr. Zuckerberg wanted to push out a project fast. The researchers in Paris were ready with Genesis. The name was changed to LLaMA, short for “Large Language Model Meta AI,” and released to 4,000 researchers outside the company. Soon Meta received over 100,000 requests for access to the code.

But within days of LLaMA’s release, someone put the code on 4chan, the fringe online message board. Meta had lost control of its chatbot, raising the possibility that the worst fears of its legal and policy teams would come true. Researchers at Stanford University showed that the Meta system could easily do things like generate racist material.

On June 6, Mr. Zuckerberg received a letter about LLaMA from Senators Josh Hawley of Missouri and Richard Blumental of Connecticut. “Hawley and Blumental demand answers from Meta,” said a news release.

The letter called Meta’s approach risky and vulnerable to abuse and compared it unfavorably with ChatGPT. Why, the senators seemed to want to know, couldn’t Meta be more like OpenAI?

79
58
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama
 
 

Seems like a really cool project. Lowering the barrier to entry of locally run models. As llamacpp supports a ton of models, I imagine it be easy to adapt this for other models other than the prebuilt ones.

80
81
 
 

I've been using TheBlokes Q8 of https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B, but now this one (https://huggingface.co/Weyaxi/OpenHermes-2.5-neural-chat-7b-v3-1-7B) I think is killing it. Has anyone else tested it?

82
20
submitted 1 year ago* (last edited 1 year ago) by noneabove1182 to c/localllama
83
84
 
 

LMSYS examines how improper data decontamination can lead to artificially inflated scores

85
 
 

H200 is up to 1.9x faster than H100. This performance is enabled by H200's larger, faster HBM3e memory.

https://nvidianews.nvidia.com/news/nvidia-supercharges-hopper-the-worlds-leading-ai-computing-platform

86
12
... (programming.dev)
submitted 1 year ago* (last edited 9 months ago) by [email protected] to c/localllama
 
 

CogVLM: Visual Expert for Pretrained Language Models

Presents CogVLM, a powerful open-source visual language foundation model that achieves SotA perf on 10 classic cross-modal benchmarks

repo: https://github.com/THUDM/CogVLM abs: https://arxiv.org/abs/2311.03079

87
9
submitted 1 year ago* (last edited 1 year ago) by noneabove1182 to c/localllama
 
 

The creator of ExLlamaV2 (turboderp) has released a lightweight web UI for running exllamav2, it's quite nice! Missing some stuff from text-generation-webui, but makes up for it by being very streamlined and clean

I've made a docker image for it for anyone who may want to try it out, GitHub repo here:

https://github.com/noneabove1182/exui-docker

And for finding models to run with exllamav2 I've been uploading several here:

https://huggingface.co/bartowski

Enjoy!

88
14
... (programming.dev)
submitted 1 year ago* (last edited 9 months ago) by [email protected] to c/localllama
 
 

article: https://x.ai

trained a prototype LLM (Grok-0) with 33 billion parameters. This early model approaches LLaMA 2 (70B) capabilities on standard LM benchmarks but uses only half of its training resources. In the last two months, we have made significant improvements in reasoning and coding capabilities leading up to Grok-1, a state-of-the-art language model that is significantly more powerful, achieving 63.2% on the HumanEval coding task and 73% on MMLU.

89
21
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama
 
 

"This feature utilizes KV cache shifting to automatically remove old tokens from context and add new ones without requiring any reprocessing."

This means a major speed increase for people like me who rely on (slow) CPU inference (or big models). Consider a chatbot scenario and a long chat where old lines of dialogue need to be evicted from the context to stay within the (4096 token) context size. Previously the context had to be re-computed starting with the first changed/now missing token. This feature detects that, deletes the affected tokens from the KV cache and shifts the subsequent tokens in the KV cache so it can be re-used. Avoiding a computationally expensive re-calculation.

This is probably also more or less related to recent advancements like Streaming-LLM

This won't help once text gets inserted "in the middle" or the prompt gets changed in another way. But I managed to connect KoboldCPP as a backend for SillyTavern/Oobabooga and now I'm able to have unlimited length conversations without waiting excessively, once the chat history hits max tokens and the frontend starts dropping text.

It's just a clever way to re-use the KV cache in one specific case. But I've wished for this for quite some time.

90
 
 

A python Gradio web UI built using GPT4AllEmbeddings

91
92
46
why reddit? (lemmy.world)
submitted 1 year ago by [email protected] to c/localllama
 
 

I don't get why a group of users that are willing to run their own LLMs locally and do not want to relay on centralized corporations like openAI or google prefer to discuss using a centralized site like Reddit

93
 
 

Phind is now using a V7 of their model for their own platform, as they have found that people overall prefer that output vs GPT4. This is extremely impressive because it's not just a random benchmark that can be gamed, but instead crowd sourced opinion on real tasks

The one place everything still lags behind GPT4 is question comprehension, but this is a huge accomplishment

Blog post: https://www.phind.com/blog/phind-model-beats-gpt4-fast

note: they've only open released V2 of their model, hopefully they release newer versions soon.. would love to play with them outside their sandbox

94
 
 

Very interesting new sampler, does a better drop at filtering out extremely unlikely tokens when the most likely tokens are less confident, from the results it seems to pretty reliably improve quality with no noticeable downside

95
 
 

30T tokens, 20.5T in English, allegedly high quality, can't wait to see people start putting it to use!

Related github: https://github.com/togethercomputer/RedPajama-Data

96
 
 

Finally got a nice script going that automates most of the process. Uploads will all be same format, with each bit per weight going into its own branch.

the first two I did don't have great READMEs but the rest will look like this one: https://huggingface.co/bartowski/Mistral-7B-claude-chat-exl2

Also taking recommendations on anything you want to see included in readme or quant levels

97
 
 

They are referencing this paper: LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset from September 30.

The paper itself provides some insight on how people use LLMs and the distribution of the different use-cases.

The researchers had a look at conversations with 25 LLMs. Data is collected from 210K unique IP addresses in the wild on their Vicuna demo and Chatbot Arena website.

98
 
 

For anyone who happens to be using my docker images or Dockerfiles for their text-gen-webui, it all started breaking this week when Oobabooga's work was updated to support 12.1

As such, I have updated my docker images and fixed a bunch of issues in the build process. Also been awhile since I posted it here.

You can find all the details here:

https://github.com/noneabove1182/text-generation-webui-docker

It requires driver version 535.113.01

Happy LLMing!

99
 
 

From the tweet (minus pictures):

Language models are bad a basic math.

GPT-4 has right around 0% accuracy rate on 5 digit multiplication.

Most open models can't even add. Why is that?

There are a few reasons why numbers are hard. The main one is Tokenization. When training a tokenizer from scratch, you take a large corpus of text and find the minimal byte-pair encoding for a chosen vocabulary size.

This means, however, that numbers will almost certainly not have unique token representations. "21" could be a single token, or ["2", "1"]. 143 could be ["143"] or ["14", "3"] or any other combination.

A potential fix here would be to force single digit tokenization. The state of the art for the last few years is to inject a space between every digit when creating the tokenizer and when running the model. This means 143 would always be tokenized as ["1", "4", "3"].

This helps boost performance, but wastes tokens while not fully fixing the problem.

A cool fix might be xVal! This work by The Polymathic AI Collaboration suggests a generic [NUM] token which is then scaled by the actual value of the number!

If you look at the red lines in the image above, you can get an intuition for how that might work.

It doesn't capture a huge range or high fidelity (e.g., 7.4449 vs 7.4448) but they showcase some pretty convincing results on sequence prediction problems that are primarily numeric.

For example, they want to train a sequence model on GPS conditioned temperature forecasting

They found a ~70x improvement over standard vanilla baselines and a 2x improvement over really strong baselines.

One cool side effect is that deep neural networks might be really good at regression problems using this encoding scheme!

100
7
Musical notation (lemmy.dbzer0.com)
submitted 1 year ago by [email protected] to c/localllama
 
 

Would adding musical notations to LLMs training data allow it to create music since notations are a lot like normal languages? Or does it do so already?

view more: ‹ prev next ›