this post was submitted on 08 Feb 2025
25 points (96.3% liked)
Asklemmy
44911 readers
1135 users here now
A loosely moderated place to ask open-ended questions
If your post meets the following criteria, it's welcome here!
- Open-ended question
- Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
- Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
- Not ad nauseam inducing: please make sure it is a question that would be new to most members
- An actual topic of discussion
Looking for support?
Looking for a community?
- Lemmyverse: community search
- sub.rehab: maps old subreddits to fediverse options, marks official as such
- [email protected]: a community for finding communities
~Icon~ ~by~ ~@Double_[email protected]~
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Generally you need between $8-10,000 worth of equipment to get a relative responsiveness from a self-hosted LLM.
Anyone downvoting clearly doesn't understand the hardware requirements to be able to run an LLM with a significant model that rivals ChatGPT. ChatGPT is a multi-billion dollar AI cluster...
OP specifically asked what kind of hardware you need to run a similar AI model with the same relative responsiveness, and GPT4 has 1.8 trillion parameters... Why would you lie and pretend like you can run a model like that on a fucking raspberry pi? You're living in a dream world... Offline models like that require 128 GB of RAM which is $900-1200 in RAM alone...
It depends on what you mean by “relative responsiveness”, but you can absolutely get ~4 tokens/sec of performance on R1 671b (Q4 quantized) from a system costing a fraction of the number you quote.
This is the point everyone downvoting me seems to be missing. OP wanted something comparable to the responsiveness of chat.chatgpt.com... Which is simply not possible without insane hardware. Like sure, if you don't care about token generation you can install an LLM on incredibly underpowered hardware and it technically works, but that's not at all what OP was asking for. They wanted a comparable experience. Which requires a lot of money.
Yeah I definitely get your point (and I didn’t downvote you, for the record). But I will note that ChatGPT generates text way faster than most people can read, and 4 tokens/second, while perhaps slower than reading speed for some people, is not that bad in my experience.