LocalLLaMA

2258 readers
1 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 1 year ago
MODERATORS
26
 
 

Hartford is credited as creator of Dolphin-Mistral, Dolphin-Mixtral and lots of other stuff.

He's done a huge amount of work on uncensored models.

27
28
29
28
submitted 7 months ago* (last edited 7 months ago) by [email protected] to c/localllama
 
 

From Simon Willison: "Mistral tweet a link to a 281GB magnet BitTorrent of Mixtral 8x22B—their latest openly licensed model release, significantly larger than their previous best open model Mixtral 8x7B. I’ve not seen anyone get this running yet but it’s likely to perform extremely well, given how good the original Mixtral was."

30
31
32
 
 

I've been using tie-fighter which hasn't been too bad with lorebooks in tavern.

33
34
 
 

Afaik most LLMs run purely on the GPU, dont they?

So if I have an Nvidia Titan X with 12GB of RAM, could I plug this into my laptop and offload the load?

I am using Fedora, so getting the NVIDIA drivers would be... fun and already probably a dealbreaker (wouldnt want to run proprietary drivers on my daily system).

I know that using ExpressPort adapters people where able to use GPUs externally, and this is possible with thunderbolt too, isnt it?

The question is, how well does this work?

Or would using a small SOC to host a webserver for the interface and do all the computing on the GPU make more sense?

I am curious about the difficulties here, ARM SOC and proprietary drivers? Laptop over USB-c (maybe not thunderbolt?) and a GPU just for the AI tasks...

35
 
 

Linux package available like LM Studio

36
37
38
 
 
39
20
submitted 8 months ago* (last edited 8 months ago) by [email protected] to c/localllama
 
 

GitHub: https://github.com/mistralai-sf24/hackathon
X: https://twitter.com/MistralAILabs/status/1771670765521281370

New release: Mistral 7B v0.2 Base (Raw pretrained model used to train Mistral-7B-Instruct-v0.2)
🔸 https://models.mistralcdn.com/mistral-7b-v0-2/mistral-7B-v0.2.tar
🔸 32k context window
🔸 Rope Theta = 1e6
🔸 No sliding window
🔸 How to fine-tune:

40
 
 

But in all fairness, it's really llama.cpp that supports AMD.

Now looking forward to the Vulkan support!

41
 
 

Excited to share my T-Ragx project! And here are some additional learnings for me that might be interesting to some:

  • vector databases aren't always the best option
    • Elasticsearch or custom retrieval methods might work even better in some cases
  • LoRA is incredibly powerful for in-task applications
  • The pace of the LLM scene is astonishing
    • TowerInstruct and ALMA-R translation LLMs launched while my project was underway
  • Above all, it was so fun!

Please let me know what you think!

42
 
 

So you don't have to click the link, here's the full text including links:

Some of my favourite @huggingface models I've quantized in the last week (as always, original models are linked in my repo so you can check out any recent changes or documentation!):

@shishirpatil_ gave us gorilla's openfunctions-v2, a great followup to their initial models: https://huggingface.co/bartowski/gorilla-openfunctions-v2-exl2

@fanqiwan released FuseLLM-VaRM, a fusion of 3 architectures and scales: https://huggingface.co/bartowski/FuseChat-7B-VaRM-exl2

@IBM used a new method called LAB (Large-scale Alignment for chatBots) for our first interesting 13B tune in awhile: https://huggingface.co/bartowski/labradorite-13b-exl2

@NeuralNovel released several, but I'm a sucker for DPO models, and this one uses their Neural-DPO dataset: https://huggingface.co/bartowski/Senzu-7B-v0.1-DPO-exl2

Locutusque, who has been making the Hercules dataset, released a preview of "Hyperion": https://huggingface.co/bartowski/hyperion-medium-preview-exl2

@AjinkyaBawase gave an update to his coding models with code-290k based on deepseek 6.7: https://huggingface.co/bartowski/Code-290k-6.7B-Instruct-exl2

@Weyaxi followed up on the success of Einstein v3 with, you guessed it, v4: https://huggingface.co/bartowski/Einstein-v4-7B-exl2

@WenhuChen with TIGER lab released StructLM in 3 sizes for structured knowledge grounding tasks: https://huggingface.co/bartowski/StructLM-7B-exl2

and that's just the highlights from this past week! If you'd like to see your model quantized and I haven't noticed it somehow, feel free to reach out :)

43
 
 

From the abstract: "Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}."

Would allow larger models with limited resources. However, this isn't a quantization method you can convert models to after the fact, Seems models need to be trained from scratch this way, and to this point they only went as far as 3B parameters. The paper isn't that long and seems they didn't release the models. It builds on the BitNet paper from October 2023.

"the matrix multiplication of BitNet only involves integer addition, which saves orders of energy cost for LLMs." (no floating point matrix multiplication necessary)

"1-bit LLMs have a much lower memory footprint from both a capacity and bandwidth standpoint"

Edit: Update: additional FAQ published

44
12
Gemma 2B vs Phi-2 (lemmy.world)
submitted 9 months ago by [email protected] to c/localllama
 
 

45
17
NVIDIA Chat With RTX (www.nvidia.com)
submitted 9 months ago by [email protected] to c/localllama
 
 

This is an interesting demo, but it has some drawbacks I can already see:

  • It's Windows only (maybe Win11 only, the documentation isn't clear)
  • It only works with RTX 30 series and up
  • It's closed source, so you have no idea if they're uploading your data somewhere

The concept is great, having an LLM to sort through your local files and help you find stuff, but it seems really limited.

I think you could get the same functionality(and more) by writing an API for text-gen-webui.

more info here: https://videocardz.com/newz/nvidia-unveils-chat-with-rtx-ai-chatbot-powered-locally-by-geforce-rtx-30-40-gpus

46
47
 
 

PolyMind is a multimodal, function calling powered LLM webui. It's designed to be used with Mixtral 8x7B + TabbyAPI and offers a wide range of features including:

Internet searching with DuckDuckGo and web scraping capabilities.

Image generation using comfyui.

Image input with sharegpt4v (Over llama.cpp's server)/moondream on CPU, OCR, and Yolo.

Port scanning with nmap.

Wolfram Alpha integration.

A Python interpreter.

RAG with semantic search for PDF and miscellaneous text files.

Plugin system to easily add extra functions that are able to be called by the model. 90% of the web parts (HTML, JS, CSS, and Flask) are written entirely by Mixtral.

48
 
 

Open source

Open data

Open training code

Fully reproducible and auditable

Pretty interesting stuff for embeddings, I'm going to try it for my RAG pipeline when I get a chance, I've not had as much success as I was hoping, maybe this english-focused one will help

49
25
submitted 9 months ago* (last edited 9 months ago) by [email protected] to c/localllama
 
 

I have a 1650 Super with 4GB of VRAM and 16GB of system RAM. I'm able to get roughly 2.7 tokens per second using the Q2 quantized dolphin 2.5 model from here (n_gpu_layers = 5). A dolphin 2.7 model exists, but I haven't tried it yet.

The Q2 model produces lower-quality output. Here are a few responses using the prompts from here.

Prompt 1

Linda has 3 sisters
Paul has 4 sisters
Can Linda and Paul be siblings?

Yes, they can be siblings. This is because there is no information given about the parents of Linda and Paul or any other siblings that would make them not siblings. Hence it is still possible for them to be siblings.

Prompt 2

Björn is 3 years older than Freya
Freya is Björn’s sister.
Björn is twice Freya’s age.
How old is Freya? How old is Björn?

Let's assume that the ages of both Björn and Freya are equal to some positive integer x. Then we can write two equations based on the given information:

  1. Björn = x + 3
  2. Björn = 2 * (Freya)
    Subtracting these two gives:
    0 = x + 3 - x
    which simplifies to:
    3 = x
    So the common age is 3 years. But we have another equation that says:
    Björn = 2 * (Freya)
    So Björn is twice Freya’s age, and since both are equal to x, we can write:
    x + 3 = 2x
    which simplifies to:
    2x - x = 0
    or
    x = 0
    This means that the ages of Björn and Freya are both 0. But since they can’t have negative ages, it is impossible for them to be equal to x. So there is no solution for this problem.

Prompt 3

What is bigger than an scorpion, and has exactly half as many letters?

The word "SPECTRUM" has exactly half as many letters as the word "SCORPION".

Prompt 4

I’m at Point Reyes standing at the edge of an oceanside cliff. What direction should i face to see the sun hit the ocean? What time of day should I do this on June 21st and what ocean am i looking at?

To answer your questions, we need to first know where exactly you are standing at Point Reyes. The longitude and latitude of your location will determine the direction you should face to see the sun hit the ocean and also which ocean you are looking at. Can you provide these coordinates?

Prompt 5

What is 5 to the power of 8 minus 6 to the power of 8? Double it and add two to the power of 3?

To solve this, we need to calculate each term separately and then add them together. First let's find out what 5 to the power of 8 minus 6 to the power of 8 is:

  • (5^8) * (5^(-1)) = 390625 * (-1) = -390625
    So now we have a negative number. Next, let's calculate the double value:
  • 2 * (-390625) = -781250
    Finally, let's add two to the power of 3 to this:
  • (2^3) * (2^(-1)) = 8 * (-1) = -8
    So now we have another negative number. Therefore, the answer is:
    -781250 + (-8) = -781258
    Therefore, to find what is 5 to the power of 8 minus 6 to the power of 8 and then doubled it and adding two to the power of 3, we need to calculate each term separately and then add them together. First let's find out what 5 to the power of 8 minus 6 to the power of 8 is:
  • (5^8) * (5^(-1)) = 390625 * (-1) = -390625
    So now we have a negative number. Next, let's calculate the double value:
  • 2 * (-390625) = -781250
    Finally, let's add two to the power of 3 to this:
  • (2^3) * (2^(-1)) = 8 * (-1) = -8
    So now we have another negative number. Therefore, the answer is:
    -781250 + (-8) = -781258

Prompt 6

Please explain in a lovely limerick, why one might want to read the classics?

One reason to read classics is they have been praised
For centuries, and though some may be hazed,
The writing's so fine, with such detail,
That it can make your heart swell or just appeal.

Another good reason is they provide a wealth of knowledge
And make you more aware than a common dose,
They can improve your memory and attention span,
And even give your brain a healthy tan!

50
view more: ‹ prev next ›