LocalLLaMA

51

29

Meet ‘Smaug-72B’: The new king of open-source AI (venturebeat.com)

submitted 10 months ago by [email protected] to c/localllama

5 comments fedilink

52

18

itsme2417/PolyMind: A multimodal, function calling powered LLM webui. (github.com)

submitted 10 months ago* (last edited 10 months ago) by noneabove1182 to c/localllama

0 comments fedilink

PolyMind is a multimodal, function calling powered LLM webui. It's designed to be used with Mixtral 8x7B + TabbyAPI and offers a wide range of features including:

Internet searching with DuckDuckGo and web scraping capabilities.

Image generation using comfyui.

Image input with sharegpt4v (Over llama.cpp's server)/moondream on CPU, OCR, and Yolo.

Port scanning with nmap.

Wolfram Alpha integration.

A Python interpreter.

RAG with semantic search for PDF and miscellaneous text files.

Plugin system to easily add extra functions that are able to be called by the model. 90% of the web parts (HTML, JS, CSS, and Flask) are written entirely by Mixtral.

53

15

Introducing Nomic Embed: A Truly Open Embedding Model (blog.nomic.ai)

submitted 10 months ago by noneabove1182 to c/localllama

0 comments fedilink

Open source

Open data

Open training code

Fully reproducible and auditable

Pretty interesting stuff for embeddings, I'm going to try it for my RAG pipeline when I get a chance, I've not had as much success as I was hoping, maybe this english-focused one will help

54

25

Uncensored Mixtral 8x7B with 4 GB of VRAM (lemmy.ml)

submitted 10 months ago* (last edited 10 months ago) by [email protected] to c/localllama

6 comments fedilink

I have a 1650 Super with 4GB of VRAM and 16GB of system RAM. I'm able to get roughly 2.7 tokens per second using the Q2 quantized dolphin 2.5 model from here (n_gpu_layers = 5). A dolphin 2.7 model exists, but I haven't tried it yet.

The Q2 model produces lower-quality output. Here are a few responses using the prompts from here.

Prompt 1

Linda has 3 sisters
Paul has 4 sisters
Can Linda and Paul be siblings?

Yes, they can be siblings. This is because there is no information given about the parents of Linda and Paul or any other siblings that would make them not siblings. Hence it is still possible for them to be siblings.

Prompt 2

Björn is 3 years older than Freya
Freya is Björn’s sister.
Björn is twice Freya’s age.
How old is Freya? How old is Björn?

Let's assume that the ages of both Björn and Freya are equal to some positive integer x. Then we can write two equations based on the given information:

Björn = x + 3

Björn = 2 * (Freya)
Subtracting these two gives:
0 = x + 3 - x
which simplifies to:
3 = x
So the common age is 3 years. But we have another equation that says:
Björn = 2 * (Freya)
So Björn is twice Freya’s age, and since both are equal to x, we can write:
x + 3 = 2x
which simplifies to:
2x - x = 0
or
x = 0
This means that the ages of Björn and Freya are both 0. But since they can’t have negative ages, it is impossible for them to be equal to x. So there is no solution for this problem.

Prompt 3

What is bigger than an scorpion, and has exactly half as many letters?

The word "SPECTRUM" has exactly half as many letters as the word "SCORPION".

Prompt 4

I’m at Point Reyes standing at the edge of an oceanside cliff. What direction should i face to see the sun hit the ocean? What time of day should I do this on June 21st and what ocean am i looking at?

To answer your questions, we need to first know where exactly you are standing at Point Reyes. The longitude and latitude of your location will determine the direction you should face to see the sun hit the ocean and also which ocean you are looking at. Can you provide these coordinates?

Prompt 5

What is 5 to the power of 8 minus 6 to the power of 8? Double it and add two to the power of 3?

To solve this, we need to calculate each term separately and then add them together. First let's find out what 5 to the power of 8 minus 6 to the power of 8 is:

(5^8) * (5^(-1)) = 390625 * (-1) = -390625
So now we have a negative number. Next, let's calculate the double value:

2 * (-390625) = -781250
Finally, let's add two to the power of 3 to this:

(2^3) * (2^(-1)) = 8 * (-1) = -8
So now we have another negative number. Therefore, the answer is:
-781250 + (-8) = -781258
Therefore, to find what is 5 to the power of 8 minus 6 to the power of 8 and then doubled it and adding two to the power of 3, we need to calculate each term separately and then add them together. First let's find out what 5 to the power of 8 minus 6 to the power of 8 is:

(5^8) * (5^(-1)) = 390625 * (-1) = -390625
So now we have a negative number. Next, let's calculate the double value:

2 * (-390625) = -781250
Finally, let's add two to the power of 3 to this:

(2^3) * (2^(-1)) = 8 * (-1) = -8
So now we have another negative number. Therefore, the answer is:
-781250 + (-8) = -781258

Prompt 6

Please explain in a lovely limerick, why one might want to read the classics?

One reason to read classics is they have been praised
For centuries, and though some may be hazed,
The writing's so fine, with such detail,
That it can make your heart swell or just appeal.

Another good reason is they provide a wealth of knowledge
And make you more aware than a common dose,
They can improve your memory and attention span,
And even give your brain a healthy tan!

55

40

Mistral CEO confirms ‘leak’ of new open source AI model nearing GPT-4 performance (venturebeat.com)

submitted 10 months ago by [email protected] to c/localllama

8 comments fedilink

56

63

Meta releases ‘Code Llama 70B’, an open-source behemoth to rival private AI development (venturebeat.com)

submitted 10 months ago by [email protected] to c/localllama

15 comments fedilink

57

24

A question about running LLMs with an AMD card (lemmings.world)

submitted 10 months ago by [email protected] to c/localllama

16 comments fedilink

On my machine I'm running opensuse tumbleweed and has the amdgpu driver installed. I use it for gaming and recently I've become interested in running LLMs. So I would like to keep a balance of both without compromising too much on performance.

I know that there are proprietary drivers for AMD cards but I'm hesitant to install it as I've heard that it performs less efficiently in games when compared to the open source driver.

I'm mainly confused about this ROCM thing. Is it not included with the opensource amdgpu drivers ? Or is it available as a separate package?

So what driver to use ?

Or perhaps, is it possible to run oogabooga or stable diffusion within a distrobox container (with the proprietary drivers) and still keep using the open source gpu drivers for the Host operating system.

58

21

Noob here, what's the best overall model for getting started with ? (lemmings.world)

submitted 10 months ago by [email protected] to c/localllama

3 comments fedilink

I have an rx 6600 and 16gb of ram and an i5 10400f

I am using oobabooga web-ui and I happened to have a gguf file of LLama2-13B-Tiefighter.Q4_K_S .

But it always says that the connection errored out when I load the model.

Anyway, please suggest any good model that I can get started with.

59

21

Zuckerberg wants to build artificial general intelligence with 350K Nvidia H100 GPUs (www.theregister.com)

submitted 11 months ago by [email protected] to c/localllama

16 comments fedilink

60

6

InternLM2 models llama-fied (self.localllama)

submitted 11 months ago by noneabove1182 to c/localllama

0 comments fedilink

Thanks to Charles for the conversion scripts, I've converted several of the new internLM2 models into Llama format. I've also made them into ExLlamaV2 while I was at it.

You can find them here:

https://huggingface.co/bartowski?search_models=internlm2

Note, the chat models seem to do something odd without outputting [UNUSED_TOKEN_145] in a way that seems equivalent to <|im_end|>, not sure why, but it works fine despite outputting that at the end.

61

39

Building a fully local LLM voice assistant to control my smart home (johnthenerd.com)

submitted 11 months ago by [email protected] to c/localllama

4 comments fedilink

62

16

argilla released distilabeled-Hermes-2.5-Mistral-7B (huggingface.co)

submitted 11 months ago* (last edited 11 months ago) by [email protected] to c/localllama

0 comments fedilink

And here is the announcement:
https://nitter.net/argilla_io/status/1745057571696693689?t=jq7Loa-Oil1XljEficeVYg&s=19

63

10

Mixtral of Experts (arxiv.org)

submitted 11 months ago by [email protected] to c/localllama

0 comments fedilink

64

25

WizardLM/WizardCoder-33B-V1.1 released! (huggingface.co)

submitted 11 months ago by noneabove1182 to c/localllama

11 comments fedilink

Based off of deepseek coder, the current SOTA 33B model, allegedly has gpt 3.5 levels of performance, will be excited to test once I've made exllamav2 quants and will try to update with my findings as a copilot model

65

16

Microsoft announces WaveCoder (twitter.com)

submitted 11 months ago by noneabove1182 to c/localllama

0 comments fedilink

Paper abstract:

Recent work demonstrates that, after being fine-tuned on a high-quality instruction dataset, the resulting model can obtain impressive capabilities to address a wide range of tasks. However, existing methods for instruction data generation often produce duplicate data and are not controllable enough on data quality. In this paper, we extend the generalization of instruction tuning by classifying the instruction data to 4 code-related tasks and propose a LLM-based Generator-Discriminator data process framework to generate diverse, high-quality instruction data from open source code. Hence, we introduce CodeOcean, a dataset comprising 20,000 instruction instances across 4 universal code-related tasks,which is aimed at augmenting the effectiveness of instruction tuning and improving the generalization ability of fine-tuned model. Subsequently, we present WaveCoder, a fine-tuned Code LLM with Widespread And Versatile Enhanced instruction tuning. This model is specifically designed for enhancing instruction tuning of Code Language Models (LLMs). Our experiments demonstrate that Wavecoder models outperform other open-source models in terms of generalization ability across different code-related tasks at the same level of fine-tuning scale. Moreover, Wavecoder exhibits high efficiency in previous code generation tasks. This paper thus offers a significant contribution to the field of instruction data generation and fine-tuning models, providing new insights and tools for enhancing performance in code-related tasks.

66

30

How to make LLMs go fast (vgel.me)

submitted 1 year ago by [email protected] to c/localllama

0 comments fedilink

I found this post interesting for my layman's understanding of LLMs and some of the underlying architecture choices that are made.

67

17

GitHub - SJTU-IPADS/PowerInfer: High-speed Large Language Model Serving on PCs with Consumer-grade GPUs (github.com)

submitted 1 year ago by [email protected] to c/localllama

1 comments fedilink

Discussion with one of the paper authors in llama.cpp: https://github.com/ggerganov/llama.cpp/discussions/4534

Thread by (apparently) a paper author on Reddit: https://www.reddit.com/r/LocalLLaMA/comments/18luk10/wait_llama_and_falcon_are_also_moe/

68

11

2023, year of open LLMs (huggingface.co)

submitted 1 year ago by [email protected] to c/localllama

0 comments fedilink

69

11

A Systems Programmer's Perspectives on Generative AI (www.bennee.com)

submitted 1 year ago by [email protected] to c/localllama

0 comments fedilink

I wrote this as a layman's primer to the basics of LLMs and other generative AI. I'm still early on in my journey but hopefully it helps explain things to other newcomers even if it glosses over the details.

70

14

Training a model without a GPU (self.localllama)

submitted 1 year ago by Matburnx to c/localllama

9 comments fedilink

Hi, I'm currently starting to learn how LLM works in depth, so I started using nanoGPT to understand how to train a model and I'd like to play around with the code a little more. So I set myself a goal to train a model that can write basic French, it doesn't to be coherent or deep in its writing, just French with correct grammar. I only have a laptop that doesn't have a proper GPU, so I can't really train a model with billions of parameters. Do you think it's possible without too much dataset or intensive training? Is it a better idea if I use something different from nanoGPT?

TLDR: I'd like to train my own LLM on my laptop which doesn't have a GPU. It's only for learning purpose, so my goal is that it can write basic French. Is it doable? If it is, do you have any tips to make this easier?

71

8

Looking for a low-end setup (self.localllama)

submitted 1 year ago by Rez to c/localllama

4 comments fedilink

I have a laptop with a Ryzen 7 5700U, 16 GB ram, running Fedora 38 linux.
I'm looking to run a local uncensored LLM, I'd like to know what would be the best model and software to run it.
I'm currently running KoboldAI and Erebus 2.7b. It's okay in terms of speed, but I'm wondering if there's anything better out there. I guess, I would prefer something that is not web-ui based to lower the overhead, if possible.
I'm not very well versed in all the lingo yet, so please keep it simple.
Thanks!

72

12

Mistral shocks AI community as latest open source model eclipses GPT-3.5 performance (venturebeat.com)

submitted 1 year ago by [email protected] to c/localllama

1 comments fedilink

Original Mistral AI blog: https://mistral.ai/news/mixtral-of-experts/

73

15

Mixture of Experts Explained (Huggingface blog) (huggingface.co)

submitted 1 year ago by noneabove1182 to c/localllama

0 comments fedilink

74

23

Mistral releases version 0.2 of their 7B model (mistral.ai)

submitted 1 year ago by noneabove1182 to c/localllama

5 comments fedilink

Available in instruct only currently:

https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2

75

11

QuIP#: SOTA 2 bit LLMs (github.com)

submitted 1 year ago by [email protected] to c/localllama

1 comments fedilink

Large language models (LLMs) exhibit amazing performance on a wide variety of tasks such as text modeling and code generation. However, they are also very large. For example Llama 2 70B has 70 billion parameters that require 140GB of memory to store in half precision. This presents many challenges, such as needing multiple GPUs just to serve a single LLM. To address these issues, researchers have developed compression methods that reduce the size of models without destroying performance.

One class of methods, post-training quantization, compresses trained model weights into lower precision formats to reduce memory requirements. For example, quantizing a model from 16 bit to 2 bit precision would reduce the size of the model by 8x, meaning that even Llama 2 70B would fit on a single 24GB GPU. In this work, we introduce QuIP#, which combines lattice codebooks with incoherence processing to create state-of-the-art 2 bit quantized models. These two methods allow QuIP# to significantly close the gap between 2 bit quantized LLMs and unquantized 16 bit models.

Project Page: https://cornell-relaxml.github.io/quip-sharp/

Code: https://github.com/Cornell-RelaxML/quip-sharp