this post was submitted on 08 Apr 2024
37 points (100.0% liked)
LocalLLaMA
2582 readers
32 users here now
Community to discuss about LLaMA, the large language model created by Meta AI.
This is intended to be a replacement for r/LocalLLaMA on Reddit.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Very nice speedups for people running CPU inference on supported hardware, but unfortunately does not help CPU+GPU split according to comment on one of the PRs.. That person says that for prompt evaluation, where these kernels would make a difference, llama.cpp performs all the calculations on the GPU. And during token generation it is IO-bound, so the faster CPU calculation becomes negligible.