sh.itjust.works

28,472 readers
995 users here now

Useful Links

Rules:

Be respectful. Everyone should feel welcome here.
No bigotry - including racism, sexism, ableism, homophobia, transphobia, or xenophobia.
No Ads / Spamming.
No pornography.

Règles :

Soyez respectueux. Tout le monde doit se sentir le bienvenu ici.
Pas de bigoterie - y compris le racisme, le sexisme, le capacitisme, l'homophobie, la transphobie ou la xénophobie.
Pas de publicités / Pas de spam.
Pas de pornographie.

Fediseer

Other UI options (more to come)

https://oldsh.itjust.works/

Monitoring Services
lemmy-meter.info

founded 2 years ago

ADMINS

llama2.c: Inference Llama 2 in one file of pure C by Andrej Karpathy (github.com)

submitted 1 year ago by noneabove1182 to c/localllama

0 comments fedilink

Have you ever wanted to inference a baby Llama 2 model in pure C? No? Well, now you can!

With this code you can train the Llama 2 LLM architecture from scratch in PyTorch, then save the weights to a raw binary file, then load that into one ~simple 500-line C file (run.c) that inferences the model, simply in fp32 for now. On my cloud Linux devbox a dim 288 6-layer 6-head model (~15M params) inferences at ~100 tok/s in fp32, and about the same on my M1 MacBook Air. I was somewhat pleasantly surprised that one can run reasonably sized models (few ten million params) at highly interactive rates with an approach this simple.

https://twitter.com/karpathy/status/1683143097604243456

Llama2.c: inference llama 2 in one file of pure C (github.com)

submitted 1 year ago by [email protected] to c/[email protected]

0 comments fedilink

HN Discussion

Llama2.c: inference llama 2 in one file of pure C (github.com)

submitted 1 year ago by [email protected] to c/[email protected]

0 comments fedilink

There is a discussion on Hacker News, but feel free to comment here as well.