LocalLLaMA

3321 readers

1 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago

MODERATORS

151

What is your average token usage (inference) pr day with your particular workflow ? (lemmy.ml)

submitted 1 year ago by [email protected] to c/localllama

0 comments fedilink

I am planning my first ai-lab setup, and was wondering how many tokens different AI-workflows/agent network eat up on an average day. For instance talking to an AI all day, have devlin running 24/7 or whatever local agent workflow is running.

Oc model inference speed and type of workflow influences most of these networks, so perhaps it's easier to define number of token pr project/result ?

So I were curious about what typical AI-workflow lemmies here run, and how many tokens that roughly implies on average, or on a project level scale ? Atmo I don't even dare to guess.

Thanks..

152

Llama 3 Establishes Meta as the Leader in “Open” AI (spectrum.ieee.org)

submitted 1 year ago by [email protected] to c/localllama

2 comments fedilink

153

Eric Hartford on X: "I am super excited to announce that I've accepted a position with @TensorWaveCloud - focused on training AI models with @AMDInstinct technologies!" (twitter.com)

submitted 1 year ago by [email protected] to c/localllama

0 comments fedilink

Hartford is credited as creator of Dolphin-Mistral, Dolphin-Mixtral and lots of other stuff.

He's done a huge amount of work on uncensored models.

154

Meta's Llama 3 will force OpenAI and other AI giants to up their game (www.itpro.com)

submitted 1 year ago by [email protected] to c/localllama

10 comments fedilink

155

Meta releases Llama 3, claims it's among the best open models available (www.yahoo.com)

submitted 1 year ago by [email protected] to c/localllama

2 comments fedilink

156

New Mistral model is out (twitter.com)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama

7 comments fedilink

From Simon Willison: "Mistral tweet a link to a 281GB magnet BitTorrent of Mixtral 8x22B—their latest openly licensed model release, significantly larger than their previous best open model Mixtral 8x7B. I’ve not seen anyone get this running yet but it’s likely to perform extremely well, given how good the original Mixtral was."

157

Meta confirms that its Llama 3 open source LLM is coming in the next month (techcrunch.com)

submitted 1 year ago by [email protected] to c/localllama

4 comments fedilink

158

LLaMA Now Goes Faster on CPUs (justine.lol)

submitted 1 year ago by [email protected] to c/localllama

1 comments fedilink

159

What's the current recommendation for an anime oriented model? (ani.social)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama

0 comments fedilink

I've been using tie-fighter which hasn't been too bad with lorebooks in tavern.

160

Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to ach (github.com)

submitted 1 year ago by [email protected] to c/localllama

0 comments fedilink

161

Dock GPU to Laptop or to small SOC? (feddit.de)

submitted 1 year ago by [email protected] to c/localllama

7 comments fedilink

Afaik most LLMs run purely on the GPU, dont they?

So if I have an Nvidia Titan X with 12GB of RAM, could I plug this into my laptop and offload the load?

I am using Fedora, so getting the NVIDIA drivers would be... fun and already probably a dealbreaker (wouldnt want to run proprietary drivers on my daily system).

I know that using ExpressPort adapters people where able to use GPUs externally, and this is possible with thunderbolt too, isnt it?

The question is, how well does this work?

Or would using a small SOC to host a webserver for the interface and do all the computing on the GPU make more sense?

I am curious about the difficulties here, ARM SOC and proprietary drivers? Laptop over USB-c (maybe not thunderbolt?) and a GPU just for the AI tasks...

162

AnythingLLM | The ultimate AI business intelligence tool (useanything.com)

submitted 1 year ago by [email protected] to c/localllama

9 comments fedilink

Linux package available like LM Studio

163

Open web UI - a web UI primarily for ollama that has a bunch of useful functionally (github.com)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama

1 comments fedilink

164

Evolving New Foundation Models: Unleashing the Power of Automating Model Development (sakana.ai)

submitted 1 year ago by [email protected] to c/localllama

0 comments fedilink

arXiv: https://arxiv.org/abs/2403.13187 [cs.NE]
GitHub: https://github.com/SakanaAI/evolutionary-model-merge

165

GaLore: Advancing Large Model Training on Consumer-grade Hardware (huggingface.co)

submitted 1 year ago by [email protected] to c/localllama

0 comments fedilink

arXiv: https://arxiv.org/abs/2403.03507 [cs.LG]

166

Mistral 7B v0.2 Base (released at SHACK15sf hackathon) (github.com)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama

1 comments fedilink

GitHub: https://github.com/mistralai-sf24/hackathon
X: https://twitter.com/MistralAILabs/status/1771670765521281370

New release: Mistral 7B v0.2 Base (Raw pretrained model used to train Mistral-7B-Instruct-v0.2)
🔸 https://models.mistralcdn.com/mistral-7b-v0-2/mistral-7B-v0.2.tar
🔸 32k context window
🔸 Rope Theta = 1e6
🔸 No sliding window
🔸 How to fine-tune:

167

Ollama now supports AMD graphics cards (ollama.com)

submitted 1 year ago by [email protected] to c/localllama

4 comments fedilink

But in all fairness, it's really llama.cpp that supports AMD.

Now looking forward to the Vulkan support!

168

T-Ragx - Enhancing Translation with RAG-Powered LLMs (github.com)

submitted 1 year ago by [email protected] to c/localllama

3 comments fedilink

Excited to share my T-Ragx project! And here are some additional learnings for me that might be interesting to some:

vector databases aren't always the best option
- Elasticsearch or custom retrieval methods might work even better in some cases
LoRA is incredibly powerful for in-task applications
The pace of the LLM scene is astonishing
- TowerInstruct and ALMA-R translation LLMs launched while my project was underway
Above all, it was so fun!

Please let me know what you think!

169

My personal collection of interesting models I've quantized from the past week (yes, just week) (twitter.com)

submitted 1 year ago by noneabove1182 to c/localllama

4 comments fedilink

So you don't have to click the link, here's the full text including links:

Some of my favourite @huggingface models I've quantized in the last week (as always, original models are linked in my repo so you can check out any recent changes or documentation!):

@shishirpatil_ gave us gorilla's openfunctions-v2, a great followup to their initial models: https://huggingface.co/bartowski/gorilla-openfunctions-v2-exl2

@fanqiwan released FuseLLM-VaRM, a fusion of 3 architectures and scales: https://huggingface.co/bartowski/FuseChat-7B-VaRM-exl2

@IBM used a new method called LAB (Large-scale Alignment for chatBots) for our first interesting 13B tune in awhile: https://huggingface.co/bartowski/labradorite-13b-exl2

@NeuralNovel released several, but I'm a sucker for DPO models, and this one uses their Neural-DPO dataset: https://huggingface.co/bartowski/Senzu-7B-v0.1-DPO-exl2

Locutusque, who has been making the Hercules dataset, released a preview of "Hyperion": https://huggingface.co/bartowski/hyperion-medium-preview-exl2

@AjinkyaBawase gave an update to his coding models with code-290k based on deepseek 6.7: https://huggingface.co/bartowski/Code-290k-6.7B-Instruct-exl2

@Weyaxi followed up on the success of Einstein v3 with, you guessed it, v4: https://huggingface.co/bartowski/Einstein-v4-7B-exl2

@WenhuChen with TIGER lab released StructLM in 3 sizes for structured knowledge grounding tasks: https://huggingface.co/bartowski/StructLM-7B-exl2

and that's just the highlights from this past week! If you'd like to see your model quantized and I haven't noticed it somehow, feel free to reach out :)

170

[Paper] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits (huggingface.co)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/localllama

18 comments fedilink

From the abstract: "Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}."

Would allow larger models with limited resources. However, this isn't a quantization method you can convert models to after the fact, Seems models need to be trained from scratch this way, and to this point they only went as far as 3B parameters. The paper isn't that long and seems they didn't release the models. It builds on the BitNet paper from October 2023.

"the matrix multiplication of BitNet only involves integer addition, which saves orders of energy cost for LLMs." (no floating point matrix multiplication necessary)

"1-bit LLMs have a much lower memory footprint from both a capacity and bandwidth standpoint"

Edit: Update: additional FAQ published

171

Gemma 2B vs Phi-2 (lemmy.world)

submitted 1 year ago by [email protected] to c/localllama

3 comments fedilink

172

NVIDIA Chat With RTX (www.nvidia.com)

submitted 1 year ago by [email protected] to c/localllama

6 comments fedilink

This is an interesting demo, but it has some drawbacks I can already see:

It's Windows only (maybe Win11 only, the documentation isn't clear)
It only works with RTX 30 series and up
It's closed source, so you have no idea if they're uploading your data somewhere

The concept is great, having an LLM to sort through your local files and help you find stuff, but it seems really limited.

I think you could get the same functionality(and more) by writing an API for text-gen-webui.

more info here: https://videocardz.com/newz/nvidia-unveils-chat-with-rtx-ai-chatbot-powered-locally-by-geforce-rtx-30-40-gpus

173

Meet ‘Smaug-72B’: The new king of open-source AI (venturebeat.com)

submitted 1 year ago by [email protected] to c/localllama

5 comments fedilink

174

itsme2417/PolyMind: A multimodal, function calling powered LLM webui. (github.com)

submitted 1 year ago* (last edited 1 year ago) by noneabove1182 to c/localllama

0 comments fedilink

PolyMind is a multimodal, function calling powered LLM webui. It's designed to be used with Mixtral 8x7B + TabbyAPI and offers a wide range of features including:

Internet searching with DuckDuckGo and web scraping capabilities.

Image generation using comfyui.

Image input with sharegpt4v (Over llama.cpp's server)/moondream on CPU, OCR, and Yolo.

Port scanning with nmap.

Wolfram Alpha integration.

A Python interpreter.

RAG with semantic search for PDF and miscellaneous text files.

Plugin system to easily add extra functions that are able to be called by the model. 90% of the web parts (HTML, JS, CSS, and Flask) are written entirely by Mixtral.

175

Introducing Nomic Embed: A Truly Open Embedding Model (blog.nomic.ai)

submitted 1 year ago by noneabove1182 to c/localllama

0 comments fedilink

Open source

Open data

Open training code

Fully reproducible and auditable

Pretty interesting stuff for embeddings, I'm going to try it for my RAG pipeline when I get a chance, I've not had as much success as I was hoping, maybe this english-focused one will help