LocalLLaMA

3208 readers

1 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago

MODERATORS

8B Byte Latent Transformer model released by meta (ai.meta.com)

submitted 1 month ago by [email protected] to c/localllama

0 comments fedilink

Continuous Thought Machines (sakana.ai)

submitted 1 month ago by [email protected] to c/localllama

0 comments fedilink

What model to grade practice test? (self.localllama)

submitted 1 month ago by HumanPerson to c/localllama

13 comments fedilink

I took a practice test (math) and would like to have it be graded by a LLM since I can't find the key online. I have 20GB VRAM, but I'm on intel Arc so I can't do gemma3. I would prefer models from ollama.com 'cause I'm not deep enough down the rabbit hole to try huggingface stuff yet and don't have time to right now.

Locally Hosted AI Radio Station (github.com)

submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/localllama

6 comments fedilink

This fork introduces a Radio Station feature where AI generates continuous radio music. The process involves two key components:

LLM: Generates the lyrics for the songs. ACE: Composes the music for the generated lyrics.

Due to the limitations of slower PCs, the demo video includes noticeable gaps (approximately 4 minutes) between the generated songs.

If your computer struggles to stream songs continuously, increasing the buffer size will result in a longer initial delay but fewer gaps between songs (until the buffer is depleted again).

By default the app attempts to load the model file gemma-3-12b-it-abliterated.q4_k_m.gguf from the same directory. However, you can also use alternative LLMs. Note that the quality of generated lyrics will vary depending on the LLM's capabilities.

32B olmo-2 03/25 (huggingface.co)

submitted 1 month ago by [email protected] to c/localllama

1 comments fedilink

model:
32B olmo-2 03/25

https://arxiv.org/abs/2501.00656

"We release all OLMo 2 artifacts openly -- models at 7B and 13B scales, both pretrained and post-trained, including their full training data, training code and recipes, training logs and thousands of intermediate checkpoints. "

Show HN: Clippy, 90s UI for local LLMs (felixrieseberg.github.io)

submitted 1 month ago by [email protected] to c/localllama

0 comments fedilink

cross-posted from: https://lemmy.bestiver.se/post/367243

Comments

Specialize LLM (lemm.ee)

submitted 1 month ago by [email protected] to c/localllama

8 comments fedilink

Hi, I'm not too informed about LLMs so I'll appreciate any correction to what I might be getting wrong. I have a collection of books I would like to train an LLM on so I could use it as a quick source of information on the topics covered by the books. Is this feasible?

NousResearch is quietly cooking some fascinating stuff presumably for full release of DeepHermes! (lemmy.world)

submitted 1 month ago by [email protected] to c/localllama

1 comments fedilink

Something I always liked about NousResearch is how they seemingly try to understand cognition in a more philosophical/metaphysically symbolic way and aren't afraid to let you know it. I think their unique view may allow them to find some new perspectives that allow for advancement in the field. Check out AscensionMaze in particular the wording they use is just fascinating.

I'm using open web ui, does anybody else have a better interface? (lemm.ee)

submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/localllama

9 comments fedilink

I'm interested in really leveraging the full capabilities of local ai, for code generation and everything else. let me know what you people are using.

Qwen3-32b: Windows95 starfield screensaver web app with warp drive on click (lemmy.world)

submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/localllama

11 comments fedilink

It's amazing how far open source LLMs have come.

Qwen3-32b recreated the Windows95 Starfield screensaver as a web app with the bonus feature to enable "warp drive" on click. This was generated with reasoning disabled (/no_think) using a 4-bit quant running locally on a 4090.

Here's the result: https://codepen.io/mekelef486/pen/xbbWGpX

Model: Qwen3-32B-Q4_K_M.gguf (Unsloth quant)

Llama.cpp Server Docker Config:

docker run \
-p 8080:8080 \
-v /path/to/models:/models \
--name llama-cpp-qwen3-32b \
--gpus all \
ghcr.io/ggerganov/llama.cpp:server-cuda \
-m /models/qwen3-32b-q4_k_m.gguf \
--host 0.0.0.0 --port 8080 \
--n-gpu-layers 65 \
--ctx-size 13000 \
--temp 0.7 \
--top-p 0.8 \
--top-k 20 \
--min-p 0

System Prompt:

You are a helpful expert and aid. Communicate clearly and succinctly. Avoid emojis.

User Prompt:

Create a simple web app that uses javascript to visualize a simple starfield, where the user is racing forward through the stars from a first person point of view like in the old Microsoft screensaver. Stars must be uniformly distributed. Clicking inside the window enables "warp speed" mode, where the visualization speeds up and star trails are added. The app must be fully contained in a single HTML file. /no_think

Technically correct (aussie.zone)

submitted 1 month ago by [email protected] to c/localllama

7 comments fedilink

Qwen3 officially released (qwenlm.github.io)

submitted 1 month ago by [email protected] to c/localllama

18 comments fedilink

https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f

https://github.com/QwenLM/Qwen3

https://modelscope.cn/collections/Qwen3-9743180bdc6b48

https://discord.gg/yPEP2vHTu4

https://www.kaggle.com/models/qwen-lm/qwen-3

Qwen3 "Leaked" (huggingface.co)

submitted 1 month ago by [email protected] to c/localllama

1 comments fedilink

Qwen3 was apparently posted early, then quickly pulled from HuggingFace and Modelscope. The large ones are MoEs, per screenshots from Reddit:

screenshots

Including a 235B/22B active and a 30B/3B active.

Context appears to 'only' be 32K unfortunately: https://huggingface.co/qingy2024/Qwen3-0.6B/blob/main/config_4b.json

But its possible they're still training them to 256K:

from reddit

Take it all with a grain of salt, configs could change with the official release, but it appears it is happening today.

DeepSeek R2 AI Model Rumors Begin to Swirl Online; Reported to Feature 97% Lower Costs Compared to GPT-4 & Fully Trained on Huawei's Ascend Chips (wccftech.com)

submitted 1 month ago by [email protected] to c/localllama

3 comments fedilink

Niche Model of the Day: Nemotron 49B 3bpw exl3 (huggingface.co)

submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/localllama

9 comments fedilink

This is one of the "smartest" models you can fit on a 24GB GPU now, with no offloading and very little quantization loss. It feels big and insightful, like a better (albeit dry) Llama 3.3 70B with thinking, and with more STEM world knowledge than QwQ 32B, but comfortably fits thanks the new exl3 quantization!

Quantization Loss

You need to use a backend that support exl3, like (at the moment) text-gen-web-ui or (soon) TabbyAPI.

How do I get started with RAG (ideally with llama.cpp)? (lemmy.dbzer0.com)

submitted 1 month ago by [email protected] to c/localllama

1 comments fedilink

I would like my model to know the code libraries I use and help me write code with them. I use llama.cpp's server and web UI for inference, but I have no clue how to get started with RAG, since it seems it is not natively supported with llama.cpp's server implementation. It almost looks like I would need to code my own agent.

I am not interested in commercial offerings or APIs. If you use RAG, how do you do it?

Less positive model (lemmy.nz)

submitted 1 month ago by [email protected] to c/localllama

9 comments fedilink

I'm currently running Gemma3, it is really good overall, but one thing that is frustrating is the relentless positivity.

It there a way to make it more critical?

I'm not looking for it to say "that is a shit" idea; but less of the "that is a great observation" or "You've made a really insightful point" etc...

If a human was talking like that, I'd be suspicious of their motives. Since it is a machine, I don't think it is trying to manipulate me, I think the programming is set too positive.

It may also be cultural, at a rule New Zealanders are less emotive in our communication, the LLM (to me) feels like are overly positive American.

Niche Model of the Day: Openbuddy 25.2q, QwQ 32B with Quantization Aware Training (huggingface.co)

submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/localllama

0 comments fedilink

Seems there's not a lot of talk about relatively unknown finetunes these days, so I'll start posting more!

Openbuddy's been on my radar, but this one is very interesting: QwQ 32B, post-trained on openbuddy's dataset, apparently with QAT applied (though it's kinda unclear) and context-extended. Observations:

Quantized with exllamav2, it seems to show lower distortion levels than nomal QwQ. Its works conspicuously well at 4.0bpw and 3.5bpw.
Seems good at long context. Have not tested 200K, but it's quite excellent in the 64K range.
Works fine in English.
The chat template is funky. It seems to mix up the and <|think|> tags in particular (why don't they just use ChatML?), and needs some wrangling with your own template.
Seems smart, can't say if it's better or worse than QwQ yet, other than it doesn't seem to "suffer" below 3.75bpw like QwQ does.

Also, I reposted this from /r/locallama, as I feel the community generally should going forward. With its spirit, it seems like we should be on Lemmy instead?

AI-generated videos now possible with gaming GPUs with just 6GB of VRAM (www.tomshardware.com)

submitted 1 month ago by [email protected] to c/localllama

5 comments fedilink

Can it play Doom? - New VLM Benchmark (www.vgbench.com)

submitted 1 month ago by [email protected] to c/localllama

0 comments fedilink

[April 2025] Which model are you using? (aussie.zone)

submitted 2 months ago by [email protected] to c/localllama

13 comments fedilink

Just thinking about making this a monthly post, which model are you using? what are the positives and negatives?

Microsoft researchers build 1-bit AI LLM with 2B parameters — model small enough to run on some CPUs (www.tomshardware.com)

submitted 2 months ago by [email protected] to c/localllama

10 comments fedilink

Trump administration reportedly considers a US DeepSeek ban | TechCrunch (techcrunch.com)

submitted 2 months ago by [email protected] to c/localllama

33 comments fedilink

The Trump administration is considering new restrictions on the Chinese AI lab DeepSeek that would limit it from buying Nvidia’s AI chips and potentially bar Americans from accessing its AI services, The New York Times reported on Wednesday.

Should be able to load the full version of DeepSeek R1 on this no prob 😎😎 (lemmy.world)

submitted 2 months ago by [email protected] to c/localllama

9 comments fedilink

Microsoft just released BitNet! (github.com)

submitted 2 months ago by [email protected] to c/localllama

6 comments fedilink

Let's go! Lossless CPU inference