this post was submitted on 08 Apr 2024
37 points (100.0% liked)

LocalLLaMA

2218 readers
30 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 1 year ago
MODERATORS
top 1 comments
sorted by: hot top controversial new old
[–] [email protected] 15 points 5 months ago* (last edited 5 months ago)

Very nice speedups for people running CPU inference on supported hardware, but unfortunately does not help CPU+GPU split according to comment on one of the PRs.. That person says that for prompt evaluation, where these kernels would make a difference, llama.cpp performs all the calculations on the GPU. And during token generation it is IO-bound, so the faster CPU calculation becomes negligible.