this post was submitted on 02 Dec 2024
382 points (99.0% liked)
Technology
59958 readers
3360 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
IMO its not really "enough" until the bus is 256 bit. Thats when 32B-72B class models start to look even theoretically runnable at decent speeds.
he was getting 1.4 tokens on a 70B model. Not setting the world on fire, but enough to load and script against 70b
https://www.youtube.com/watch?v=xyKEQjUzfAk
Also that is a very low context test. A longer context will bog it down, even setting aside the prompt processing time.
...On the other hand, you could probably squeeze a bit more running openvino instead of llama.cpp, so that is still respectable.
yeah, it's definitely not good enough for user-facing work, but if I'm working on development for something like translations, being able to see the 70b output to compare it to other models, it's super useful before I send it off to something that costs more money to run.
9/10 times, the bigger model isn't significantly better for what I'm trying to do, but it's really nice to confirm that.