LocalLLaMA

76

16

ByteDance AI Promises Stronger than Gemini Open Weight GPT Dropping Soon (www.reddit.com)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama

1 comments fedilink

Nitter “original” thread: https://nitter.net/QuanquanGu/status/1732484036160012798

77

14

Mistral drops a new magnet download (twitter.com)

submitted 1 year ago by noneabove1182 to c/localllama

2 comments fedilink

Early speculation is that it's an MoE (mixture of experts) of 8 7b models, so maybe not earth shattering like their last release but highly intriguing, will update with more info as it comes out

78

13

Inside the A.I. Arms Race That Changed Silicon Valley Forever (archive.ph)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama

0 comments fedilink

Original (pay-walled): https://www.nytimes.com/2023/12/05/technology/ai-chatgpt-google-meta.html

Section on LLaMA:

Zuckerberg Gets Warned

Actually, months earlier Meta had released its own chatbot — to very little notice.

BlenderBot was a flop. The A.I.-powered bot, released in August 2022, was built to carry on conversations — and that it did. It said that Donald J. Trump was still president and that President Biden had lost in 2020. Mark Zuckerberg, it told a user, was “creepy.” Then two weeks before ChatGPT was released, Meta introduced Galactica. Designed for scientific research, it could instantly write academic articles and solve math problems. Someone asked it to write a research paper about the history of bears in space. It did. After three days, Galactica was shut down.

Mr. Zuckerberg’s head was elsewhere. He had spent the entire year reorienting the company around the metaverse and was focused on virtual and augmented reality.

But ChatGPT would demand his attention. His top A.I. scientist, Yann LeCun, arrived in the Bay Area from New York about six weeks later for a routine management meeting at Meta, according to a person familiar with the meeting. Dr. LeCun led a double life — as Meta’s chief A.I. scientist and a professor at New York University. The Frenchman had won the Turing Award, computer science’s most prestigious honor, alongside Dr. Hinton, for work on neural networks.

As they waited in line for lunch at a cafe in Meta’s Frank Gehry-designed headquarters, Dr. LeCun delivered a warning to Mr. Zuckerberg. He said Meta should match OpenAI’s technology and also push forward with work on an A.I. assistant that could do stuff on the internet on your behalf. Websites like Facebook and Instagram could become extinct, he warned. A.I. was the future.

Mr. Zuckerberg didn’t say much, but he was listening. There was plenty of A.I. at work across Meta’s apps — Facebook, Instagram, WhatsApp — but it was under the hood. Mr. Zuckerberg was frustrated. He wanted the world to recognize the power of Meta’s A.I. Dr. LeCun had always argued that going open-source, making the code public, would attract countless researchers and developers to Meta’s technology, and help improve it at a far faster pace. That would allow Meta to catch up — and put Mr. Zuckerberg back in league with his fellow moguls. But it would also allow anyone to manipulate the technology to do bad things.

At dinner that evening, Mr. Zuckerberg approached Dr. LeCun. “I have been thinking about what you said,” Mr. Zuckerberg told his chief A.I. scientist, according to a person familiar with the conversation. “And I think you’re right.”

In Paris, Dr. LeCun’s scientists had developed an A.I.-powered bot that they wanted to release as open-source technology. Open source meant that anyone could tinker with its code. They called it Genesis, and it was pretty much ready to go. But when they sought permission to release it, Meta’s legal and policy teams pushed back, according to five people familiar with the discussion.

Caution versus speed was furiously debated among the executive team in early 2023 as Mr. Zuckerberg considered Meta’s course in the wake of ChatGPT.

Had everyone forgotten about the last seven years of Facebook’s history? That was the question asked by the legal and policy teams. They reminded Mr. Zuckerberg about the uproar over hate speech and misinformation on Meta’s platforms and the scrutiny the company had endured by the news media and Congress after the 2016 election.

Open sourcing the code might put powerful tech into the hands of those with bad intentions and Meta would take the blame. Jennifer Newstead, Meta’s chief legal officer, told Mr. Zuckerberg that an open-source approach to A.I. could attract the attention of regulators who already had the company in their cross hairs, according to two people familiar with her concerns.

At a meeting in late January in his office, called the aquarium because it looked like one, Mr. Zuckerberg told executives that he had made his decision. Parts of Meta would be reorganized and its priorities changed. There would be weekly meetings to update executives on A.I. progress. Hundreds of employees would be moved around. Mr. Zuckerberg declared in a Facebook post that Meta would “turbocharge” its work on A.I.

Mr. Zuckerberg wanted to push out a project fast. The researchers in Paris were ready with Genesis. The name was changed to LLaMA, short for “Large Language Model Meta AI,” and released to 4,000 researchers outside the company. Soon Meta received over 100,000 requests for access to the code.

But within days of LLaMA’s release, someone put the code on 4chan, the fringe online message board. Meta had lost control of its chatbot, raising the possibility that the worst fears of its legal and policy teams would come true. Researchers at Stanford University showed that the Meta system could easily do things like generate racist material.

On June 6, Mr. Zuckerberg received a letter about LLaMA from Senators Josh Hawley of Missouri and Richard Blumental of Connecticut. “Hawley and Blumental demand answers from Meta,” said a news release.

The letter called Meta’s approach risky and vulnerable to abuse and compared it unfavorably with ChatGPT. Why, the senators seemed to want to know, couldn’t Meta be more like OpenAI?

79

58

LLMs made into single-file executables with llamafile (hacks.mozilla.org)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama

4 comments fedilink

Seems like a really cool project. Lowering the barrier to entry of locally run models. As llamacpp supports a ton of models, I imagine it be easy to adapt this for other models other than the prebuilt ones.

80

15

Unsloth: 80% faster 50% less memory LLM finetuning (github.com)

submitted 1 year ago by [email protected] to c/localllama

1 comments fedilink

81

13

I'm having a fantastic time with this model. (huggingface.co)

submitted 1 year ago by [email protected] to c/localllama

6 comments fedilink

I've been using TheBlokes Q8 of https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B, but now this one (https://huggingface.co/Weyaxi/OpenHermes-2.5-neural-chat-7b-v3-1-7B) I think is killing it. Has anyone else tested it?

82

20

Orca 2: Teaching Small Language Models How to Reason (www.microsoft.com)

submitted 1 year ago* (last edited 1 year ago) by noneabove1182 to c/localllama

7 comments fedilink

Orca 2 released by Microsoft!

Full weights here:

https://huggingface.co/microsoft/Orca-2-7b

https://huggingface.co/microsoft/Orca-2-13b

My own exllamav2 quants here:

https://huggingface.co/bartowski/Orca-2-7b-exl2

https://huggingface.co/bartowski/Orca-2-13b-exl2

GGUF from TheBloke (links to GPTQ/AWQ inside it):

https://huggingface.co/TheBloke/Orca-2-7B-GGUF

https://huggingface.co/TheBloke/Orca-2-13B-GGUF

83

34

Hundreds of OpenAI employees threaten to resign and join Microsoft (www.theverge.com)

submitted 1 year ago by noneabove1182 to c/localllama

8 comments fedilink

84

18

Catch me if you can! How to beat GPT-4 with a 13B model | LMSYS Org (lmsys.org)

submitted 1 year ago by noneabove1182 to c/localllama

0 comments fedilink

LMSYS examines how improper data decontamination can lead to artificially inflated scores

85

10

TensorRT-LLM evaluation of the new H200 GPU achieves 11,819 tokens/s on Llama2-13B (github.com)

submitted 1 year ago by noneabove1182 to c/localllama

0 comments fedilink

H200 is up to 1.9x faster than H100. This performance is enabled by H200's larger, faster HBM3e memory.

https://nvidianews.nvidia.com/news/nvidia-supercharges-hopper-the-worlds-leading-ai-computing-platform

86

12

... (programming.dev)

submitted 1 year ago* (last edited 9 months ago) by [email protected] to c/localllama

0 comments fedilink

CogVLM: Visual Expert for Pretrained Language Models

Presents CogVLM, a powerful open-source visual language foundation model that achieves SotA perf on 10 classic cross-modal benchmarks

repo: https://github.com/THUDM/CogVLM abs: https://arxiv.org/abs/2311.03079

87

9

ExUI - a lightweight web UI for ExLlamaV2 by turboderp (github.com)

submitted 1 year ago* (last edited 1 year ago) by noneabove1182 to c/localllama

0 comments fedilink

The creator of ExLlamaV2 (turboderp) has released a lightweight web UI for running exllamav2, it's quite nice! Missing some stuff from text-generation-webui, but makes up for it by being very streamlined and clean

I've made a docker image for it for anyone who may want to try it out, GitHub repo here:

https://github.com/noneabove1182/exui-docker

And for finding models to run with exllamav2 I've been uploading several here:

https://huggingface.co/bartowski

Enjoy!

88

14

... (programming.dev)

submitted 1 year ago* (last edited 9 months ago) by [email protected] to c/localllama

2 comments fedilink

article: https://x.ai

trained a prototype LLM (Grok-0) with 33 billion parameters. This early model approaches LLaMA 2 (70B) capabilities on standard LM benchmarks but uses only half of its training resources. In the last two months, we have made significant improvements in reasoning and coding capabilities leading up to Grok-1, a state-of-the-art language model that is significantly more powerful, achieving 63.2% on the HumanEval coding task and 73% on MMLU.

89

21

New "Context Shifting" feature in KoboldCPP 1.48 (github.com)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama

2 comments fedilink

"This feature utilizes KV cache shifting to automatically remove old tokens from context and add new ones without requiring any reprocessing."

This means a major speed increase for people like me who rely on (slow) CPU inference (or big models). Consider a chatbot scenario and a long chat where old lines of dialogue need to be evicted from the context to stay within the (4096 token) context size. Previously the context had to be re-computed starting with the first changed/now missing token. This feature detects that, deletes the affected tokens from the KV cache and shifts the subsequent tokens in the KV cache so it can be re-used. Avoiding a computationally expensive re-calculation.

This is probably also more or less related to recent advancements like Streaming-LLM

This won't help once text gets inserted "in the middle" or the prompt gets changed in another way. But I managed to connect KoboldCPP as a backend for SillyTavern/Oobabooga and now I'm able to have unlimited length conversations without waiting excessively, once the chat history hits max tokens and the frontend starts dropping text.

It's just a clever way to re-use the KV cache in one specific case. But I've wished for this for quite some time.

90

9

LangChain Web UI - ollama (github.com)

submitted 1 year ago by [email protected] to c/localllama

0 comments fedilink

A python Gradio web UI built using GPT4AllEmbeddings

91

22

Talk with LLaMA AI in your terminal (github.com)

submitted 1 year ago by [email protected] to c/localllama

0 comments fedilink

92

46

why reddit? (lemmy.world)

submitted 1 year ago by [email protected] to c/localllama

13 comments fedilink

I don't get why a group of users that are willing to run their own LLMs locally and do not want to relay on centralized corporations like openAI or google prefer to discuss using a centralized site like Reddit

93

9

Phind V7 subjectively performing at GPT4 levels for coding (news.ycombinator.com)

submitted 1 year ago by noneabove1182 to c/localllama

7 comments fedilink

Phind is now using a V7 of their model for their own platform, as they have found that people overall prefer that output vs GPT4. This is extremely impressive because it's not just a random benchmark that can be gamed, but instead crowd sourced opinion on real tasks

The one place everything still lags behind GPT4 is question comprehension, but this is a huge accomplishment

Blog post: https://www.phind.com/blog/phind-model-beats-gpt4-fast

note: they've only open released V2 of their model, hopefully they release newer versions soon.. would love to play with them outside their sandbox

94

10

Min P sampler (an alternative to Top K/Top P) has been merged into llama.cpp (github.com)

submitted 1 year ago by noneabove1182 to c/localllama

0 comments fedilink

Very interesting new sampler, does a better drop at filtering out extremely unlikely tokens when the most likely tokens are less confident, from the results it seems to pretty reliably improve quality with no noticeable downside

95

35

HUGE dataset released for open source use (together.ai)

submitted 1 year ago by noneabove1182 to c/localllama

4 comments fedilink

30T tokens, 20.5T in English, allegedly high quality, can't wait to see people start putting it to use!

96

16

I've started uploading quants of exllama v2 models, taking requests (huggingface.co)

submitted 1 year ago* (last edited 1 year ago) by noneabove1182 to c/localllama

0 comments fedilink

Finally got a nice script going that automates most of the process. Uploads will all be same format, with each bit per weight going into its own branch.

the first two I did don't have great READMEs but the rest will look like this one: https://huggingface.co/bartowski/Mistral-7B-claude-chat-exl2

Also taking recommendations on anything you want to see included in readme or quant levels

97

15

Nearly 10% of people ask AI chatbots for explicit content. Will it lead LLMs astray? [Article from October 3] (www.zdnet.com)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama

18 comments fedilink

They are referencing this paper: LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset from September 30.

The paper itself provides some insight on how people use LLMs and the distribution of the different use-cases.

The researchers had a look at conversations with 25 LLMs. Data is collected from 210K unique IP addresses in the wild on their Vicuna demo and Chatbot Arena website.

98

14

Text Generation Web-UI has been updated to CUDA 12.1, and with it new docker images are needed (self.localllama)

submitted 1 year ago by noneabove1182 to c/localllama

0 comments fedilink

For anyone who happens to be using my docker images or Dockerfiles for their text-gen-webui, it all started breaking this week when Oobabooga's work was updated to support 12.1

As such, I have updated my docker images and fixed a bunch of issues in the build process. Also been awhile since I posted it here.

You can find all the details here:

https://github.com/noneabove1182/text-generation-webui-docker

It requires driver version 535.113.01

Happy LLMing!

99

20

Single Digit tokenization improves LLM math abilities by up to 70x (twitter.com)

submitted 1 year ago by noneabove1182 to c/localllama

2 comments fedilink

From the tweet (minus pictures):

Language models are bad a basic math.

GPT-4 has right around 0% accuracy rate on 5 digit multiplication.

Most open models can't even add. Why is that?

There are a few reasons why numbers are hard. The main one is Tokenization. When training a tokenizer from scratch, you take a large corpus of text and find the minimal byte-pair encoding for a chosen vocabulary size.

This means, however, that numbers will almost certainly not have unique token representations. "21" could be a single token, or ["2", "1"]. 143 could be ["143"] or ["14", "3"] or any other combination.

A potential fix here would be to force single digit tokenization. The state of the art for the last few years is to inject a space between every digit when creating the tokenizer and when running the model. This means 143 would always be tokenized as ["1", "4", "3"].

This helps boost performance, but wastes tokens while not fully fixing the problem.

A cool fix might be xVal! This work by The Polymathic AI Collaboration suggests a generic [NUM] token which is then scaled by the actual value of the number!

If you look at the red lines in the image above, you can get an intuition for how that might work.

It doesn't capture a huge range or high fidelity (e.g., 7.4449 vs 7.4448) but they showcase some pretty convincing results on sequence prediction problems that are primarily numeric.

For example, they want to train a sequence model on GPS conditioned temperature forecasting

They found a ~70x improvement over standard vanilla baselines and a 2x improvement over really strong baselines.

One cool side effect is that deep neural networks might be really good at regression problems using this encoding scheme!

100

7

Musical notation (lemmy.dbzer0.com)

submitted 1 year ago by [email protected] to c/localllama

4 comments fedilink

Would adding musical notations to LLMs training data allow it to create music since notations are a lot like normal languages? Or does it do so already?