this post was submitted on 21 Feb 2024
88 points (90.0% liked)
Technology
59105 readers
3344 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I'm just trying to get my hands on some faster hardware, https://groq.com has been able to do some crazy shit with their 500/tokens/sec on their LPUs
What kind of a website is that? Super slow and doesn't work without web assembly. Do you really need that for a simple interface
It's not about their frontend, they are running custom LPUs which can process LLM tokens at 500/sec which is insanely impressive.
For reference with a max size of 2k tokens, my dual xeon silver 4114 procs take 2-3 minutes.
That with a fp16 model? Don't be scared to try even a 4 bit quantization, you'd be surprised at how little is lost and how much quicker it is.