sh.itjust.works

28,472 readers
995 users here now

Useful Links

Rules:

Règles :

Fediseer
Fediseer
Matrix

Other UI options (more to come)

Monitoring Services
lemmy-meter.info

founded 2 years ago
ADMINS
1
 
 

Have you ever wanted to inference a baby Llama 2 model in pure C? No? Well, now you can!

With this code you can train the Llama 2 LLM architecture from scratch in PyTorch, then save the weights to a raw binary file, then load that into one ~simple 500-line C file (run.c) that inferences the model, simply in fp32 for now. On my cloud Linux devbox a dim 288 6-layer 6-head model (~15M params) inferences at ~100 tok/s in fp32, and about the same on my M1 MacBook Air. I was somewhat pleasantly surprised that one can run reasonably sized models (few ten million params) at highly interactive rates with an approach this simple.

https://twitter.com/karpathy/status/1683143097604243456

2
3
 
 

There is a discussion on Hacker News, but feel free to comment here as well.

view more: next ›