LocalLLaMA

2582 readers

32 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago

MODERATORS

SkySyrup

pax

noneabove1182

LLaMA Now Goes Faster on CPUs (justine.lol)

submitted 10 months ago by [email protected] to c/localllama

1 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 15 points 10 months ago* (last edited 10 months ago)

Very nice speedups for people running CPU inference on supported hardware, but unfortunately does not help CPU+GPU split according to comment on one of the PRs.. That person says that for prompt evaluation, where these kernels would make a difference, llama.cpp performs all the calculations on the GPU. And during token generation it is IO-bound, so the faster CPU calculation becomes negligible.

permalink
fedilink
source