noneabove1182

joined 2 years ago
MODERATOR OF
[–] noneabove1182 2 points 1 year ago

You could definitely do clever things to obfuscate what you're doing, but it's much easier to replicate building the image as there are no external dependencies, if you have docker installed then you can build any docker image

[–] noneabove1182 2 points 1 year ago (2 children)

When you make a docker image and push it to dockerhub all of the instructions it took appear there so it's very transparent, also super easy for any person to build it themselves unlike executables, just download the Dockerfile and run a single command

[–] noneabove1182 0 points 1 year ago (4 children)

Besides the obvious of telling your users to build the exe, have you considered alternative distribution methods like docker?

[–] noneabove1182 1 points 1 year ago

agreed, it seems quite capable, i haven't tested all the way down to q2 to verify but i'm not surprised

[–] noneabove1182 2 points 1 year ago

There's apparently a pip command to display the leaderboard, if this ends up being of interest to people I could make a post and just update it every so often with the latest leaderboard

[–] noneabove1182 1 points 1 year ago

Yeah it's a step in the right direction at least, though now that you mention it doesn't lmsys or someone do the same with human eval and side by side comparisons?

It's such a tricky line to walk between deterministic questions (repeatable but cheatable) and user questions (real world but potentially unfair)

[–] noneabove1182 2 points 1 year ago

I have the snap installed, for what it's worth it's pretty painless AS LONG AS YOU DON'T WANT TO DO ANYTHING SILLY

I've found it nearly impossible to alter the base behaviour and have it not entirely break, so if nextcloud out of the box does exactly what you want, go ahead and install it via snap...

I predict that on docker you're going to have a bad time if you can't give it host network mode and try to just forward ports

That said, docker >>>> VM in my books

[–] noneabove1182 3 points 1 year ago* (last edited 1 year ago) (2 children)

lmao a reasonable request, I'm pretty disappointed they don't have it hosted anywhere..

here's a link to their latest image of the leaderboard for what it's worth:

https://cdn.discordapp.com/attachments/1134163974296961195/1138833170838589471/image1.png

[–] noneabove1182 4 points 1 year ago

I've managed to get it running in koboldcpp, had to add --forceversion 405 because it wasn't being detected properly, even with q5.1 I was getting an impressive 15 T/s and the code actually seemed decent, this might be a really good candidate for fine-tuning on large datasets and passing massive contexts of basically entire small repos or at least several full files

Odd they chose neox as their model, I think only ctranslate2 can offload those? I had trouble getting the GPTQ running in autogptq.. maybe the huggingface TGI would work better

[–] noneabove1182 2 points 1 year ago

I've been impressed with Vicuna 1.5, seems quite competent and enjoyable. Unfortunately I'm only able to do 13B at any reasonable speed so that's where I tend to stay, though funny enough I haven't tried any 70Bs since llama.cpp added support, I'll have to start some downloads...

[–] noneabove1182 2 points 1 year ago (6 children)

The same thing is happening here that happened to smartphones, we started out saying they were the be-all end-all, but largely because they were all so goddam different that it was impossible to compare them 1:1 in any meaningful way without some kind of automation like benchmarks

Then some people started cheating them, and we noticed that really the benchmarks, while nice for generating big pretty numbers, don't actually have much correlation to real world performance, and more often than not would miss-represent what the product was capable of

Eventually we get to a point where we can harmonize between benchmarks providing useful metrics and frames of reference for showing that there's something wrong, and having real reviews that dive into how the actual model works in the real world

[–] noneabove1182 1 points 1 year ago (1 children)

I would love to see more of this and maybe making it its own post for more traction and discussion, do you have a link to those pictures elsewhere? can't seem to get a large version loaded on desktop haha.

 

Watch6 - Graphite and Cream €319.99 €369.99 (LTE)

Watch6 Classic - Graphite and Silver €349.99 €399.99 (LTE)

 

Took me some time to figure this one out, and unfortunately requires a significantly larger image (need so much more of nvidia's toolkit D: couldn't figure out a way to get around it..)

If people prefer a smaller image, I can start maintaining one for exllama and one without, but for now 1.0 is identical minus exllama support (and I guess also from an older commit) so you can use that one until there's actual new functionality :)

9
submitted 2 years ago* (last edited 2 years ago) by noneabove1182 to c/localllama
 

New models posted by TheBloke, 7B to 65B, something for everyone!

Info from creators:

A stunning arrival! The fully upgraded Robin Series V2 language model is ready and eagerly awaiting your exploration.

This is not just a model upgrade, but the crystallization of wisdom from our research and development team. In the new version, Robin Series V2 has performed excellently among various open-source models, defeating well-known models such as Falcon, LLaMA, StableLM, RedPajama, MPT.

Specifically, we have carried out in-depth fine-tuning based on the entire LLaMA series, including 7b, 13b, 33b, 65b, all of which have achieved pleasing results. Robin-7b scored 51.7 in the OpenLLM standard test, and Robin-13b even reached as high as 59.1, ranking sixth, surpassing many 33b models. The achievements of Robin-33b and Robin-65b are even more surprising, with scores of 64.1 and 65.2 respectively, firmly securing the top positions.

 

Saw this posted over here: https://sh.itjust.works/post/163355

sounds like a really fun concept that should be shared here too :D

 

Main link is to GPU image, CPU image can be found here:

https://hub.docker.com/r/noneabove1182/text-gen-ui-cpu

The CPU one is built for exclusively running with a CPU. The GPU one is compiled with CUDA support and gets blazing fast ingestion and generation.

Included in each readme is a disclaimer that I am once again not affiliated, and I include an example working docker-compose.yml, make sure you change the args to fit your own personal setup! :)

Feel free to ask any questions or let me know if anything doesn't work! Hacked it together by the skin of my teeth, and put a LOT of effort into reducing image size for the GPU one (16GB down to 9GB, still massive..) so please do post if you have any issues!

 

Maintaining an image for myself to containerize koboldcpp so figured might as well share it with others :) updated to 1.30.3

view more: ‹ prev next ›