overview for noneabove1182

WizardLM/WizardCoder-33B-V1.1 released! in c/localllama

[–] noneabove1182 2 points 10 months ago (1 children)

I don't have a lot of experience with either at this time, I've used them here and there for programming questions but usually I stick to 7b models because I use them for code completion and I only find that useful if it completes the code before I do lol

That said, I've had overall good answers from either whenever I've decided to pull them out, it feels like wizard coder should be better since it's so much newer but overall it hasn't been that different. Wish phind would release an update :(

WizardLM/WizardCoder-33B-V1.1 released! in c/localllama

[–] noneabove1182 3 points 11 months ago

I run my Nvidia stuff in containers to not have to deal with all the stupid shenanigans

WizardLM/WizardCoder-33B-V1.1 released! in c/localllama

[–] noneabove1182 2 points 11 months ago (2 children)

The 3060 is a nice cheap one for running okay sized models, but if you can find a way to stretch for a 3090 or a 7900 XTX you'll be able to run these 33B models with decent quant levels

WizardLM/WizardCoder-33B-V1.1 released! in c/localllama

[–] noneabove1182 5 points 11 months ago (3 children)

First few quants are up: https://huggingface.co/bartowski/WizardCoder-33B-V1.1-exl2

4.25 should fit nicely into 24gb (3090, 4090)

Smaller sizes still being created, 3.5, 3.0, and 2.4

Do you have a Heat Pump in a cold climate? in c/[email protected]

[–] noneabove1182 6 points 11 months ago

I live in Ontario where we go down to -30C in the harshest conditions.

We have a heat pump and a furnace and they alternate based on efficiency

Somewhere around -5 to +5 C it switches from the heat pump to the furnace

I think you could get by a bit colder but it really loses out on efficiency vs burning gas unless you invest in a geothermal heat pump

Mistral releases version 0.2 of their 7B model in c/localllama

[–] noneabove1182 3 points 1 year ago

Seems relatively uncensored, willing to answer most questions

Mistral releases version 0.2 of their 7B model in c/localllama

[–] noneabove1182 3 points 1 year ago (1 children)

It's definitely a little odd.. I'm glad they did any kind of official release for 0.2, but yeah information is sorely lacking and would be nice to have more, especially with how revolutionary the previous one was.. is this incremental? Is it a huge change? Is it just more fine tuning? Did they start from scratch? We'll never know 🤷‍♂️

Mistral drops a new magnet download in c/localllama

[–] noneabove1182 3 points 1 year ago

The only concern I had was my god is it a lot of faith to put in this random twitter, hope they never get hacked lol, but otherwise yes it's a wonderful idea, would be a good feature for huggingface to speed up downloads/uploads

I'm having a fantastic time with this model. in c/localllama

[–] noneabove1182 2 points 1 year ago

Yeah this seems less focused on creativity, there's a lot of really good models out there tuned for story telling that will far exceed generalized SoTA models

Unsloth: 80% faster 50% less memory LLM finetuning in c/localllama

[–] noneabove1182 4 points 1 year ago

Better finetuning is such an important factor, i feel like the future is all of us having our own personal tunes for models that work well with our lives, and iterating for learning more basically every day is also really helpful, so the more barriers we can take down the better!

I'm having a fantastic time with this model. in c/localllama

[–] noneabove1182 3 points 1 year ago* (last edited 1 year ago) (1 children)

Hmm had interesting results from both of those base models, haven't tried the combo yet, will start some exllamav2 quants to test

What's it doing well at?

quant link for anyone who may want: https://huggingface.co/bartowski/OpenHermes-2.5-neural-chat-7b-v3-1-7B-exl2

Orca 2: Teaching Small Language Models How to Reason in c/localllama

[–] noneabove1182 2 points 1 year ago (1 children)

Btw there's a 16k tune available now:

https://huggingface.co/bartowski/Orca-2-13B-16k-exl2

10

TensorRT-LLM evaluation of the new H200 GPU achieves 11,819 tokens/s on Llama2-13B (github.com)

submitted 1 year ago by noneabove1182 to c/localllama

0 comments fedilink

H200 is up to 1.9x faster than H100. This performance is enabled by H200's larger, faster HBM3e memory.

https://nvidianews.nvidia.com/news/nvidia-supercharges-hopper-the-worlds-leading-ai-computing-platform

9

ExUI - a lightweight web UI for ExLlamaV2 by turboderp (github.com)

submitted 1 year ago* (last edited 1 year ago) by noneabove1182 to c/localllama

0 comments fedilink

The creator of ExLlamaV2 (turboderp) has released a lightweight web UI for running exllamav2, it's quite nice! Missing some stuff from text-generation-webui, but makes up for it by being very streamlined and clean

I've made a docker image for it for anyone who may want to try it out, GitHub repo here:

https://github.com/noneabove1182/exui-docker

And for finding models to run with exllamav2 I've been uploading several here:

https://huggingface.co/bartowski

Enjoy!

9

Phind V7 subjectively performing at GPT4 levels for coding (news.ycombinator.com)

submitted 1 year ago by noneabove1182 to c/localllama

7 comments fedilink

Phind is now using a V7 of their model for their own platform, as they have found that people overall prefer that output vs GPT4. This is extremely impressive because it's not just a random benchmark that can be gamed, but instead crowd sourced opinion on real tasks

The one place everything still lags behind GPT4 is question comprehension, but this is a huge accomplishment

Blog post: https://www.phind.com/blog/phind-model-beats-gpt4-fast

note: they've only open released V2 of their model, hopefully they release newer versions soon.. would love to play with them outside their sandbox

10

Min P sampler (an alternative to Top K/Top P) has been merged into llama.cpp (github.com)

submitted 1 year ago by noneabove1182 to c/localllama

0 comments fedilink

Very interesting new sampler, does a better drop at filtering out extremely unlikely tokens when the most likely tokens are less confident, from the results it seems to pretty reliably improve quality with no noticeable downside

35

HUGE dataset released for open source use (together.ai)

submitted 1 year ago by noneabove1182 to c/localllama

4 comments fedilink

30T tokens, 20.5T in English, allegedly high quality, can't wait to see people start putting it to use!

16

I've started uploading quants of exllama v2 models, taking requests (huggingface.co)

submitted 1 year ago* (last edited 1 year ago) by noneabove1182 to c/localllama

0 comments fedilink

Finally got a nice script going that automates most of the process. Uploads will all be same format, with each bit per weight going into its own branch.

the first two I did don't have great READMEs but the rest will look like this one: https://huggingface.co/bartowski/Mistral-7B-claude-chat-exl2

Also taking recommendations on anything you want to see included in readme or quant levels

14

Text Generation Web-UI has been updated to CUDA 12.1, and with it new docker images are needed (self.localllama)

submitted 1 year ago by noneabove1182 to c/localllama

0 comments fedilink

For anyone who happens to be using my docker images or Dockerfiles for their text-gen-webui, it all started breaking this week when Oobabooga's work was updated to support 12.1

As such, I have updated my docker images and fixed a bunch of issues in the build process. Also been awhile since I posted it here.

You can find all the details here:

https://github.com/noneabove1182/text-generation-webui-docker

It requires driver version 535.113.01

Happy LLMing!

20

Single Digit tokenization improves LLM math abilities by up to 70x (twitter.com)

submitted 1 year ago by noneabove1182 to c/localllama

2 comments fedilink

From the tweet (minus pictures):

Language models are bad a basic math.

GPT-4 has right around 0% accuracy rate on 5 digit multiplication.

Most open models can't even add. Why is that?

There are a few reasons why numbers are hard. The main one is Tokenization. When training a tokenizer from scratch, you take a large corpus of text and find the minimal byte-pair encoding for a chosen vocabulary size.

This means, however, that numbers will almost certainly not have unique token representations. "21" could be a single token, or ["2", "1"]. 143 could be ["143"] or ["14", "3"] or any other combination.

A potential fix here would be to force single digit tokenization. The state of the art for the last few years is to inject a space between every digit when creating the tokenizer and when running the model. This means 143 would always be tokenized as ["1", "4", "3"].

This helps boost performance, but wastes tokens while not fully fixing the problem.

A cool fix might be xVal! This work by The Polymathic AI Collaboration suggests a generic [NUM] token which is then scaled by the actual value of the number!

If you look at the red lines in the image above, you can get an intuition for how that might work.

It doesn't capture a huge range or high fidelity (e.g., 7.4449 vs 7.4448) but they showcase some pretty convincing results on sequence prediction problems that are primarily numeric.

For example, they want to train a sequence model on GPS conditioned temperature forecasting

They found a ~70x improvement over standard vanilla baselines and a 2x improvement over really strong baselines.

One cool side effect is that deep neural networks might be really good at regression problems using this encoding scheme!

31

Inside The OnePlus Open – And The Machines That Torture It [Exclusive] - MrMobile (youtu.be)

submitted 1 year ago by noneabove1182 to c/[email protected]

9 comments fedilink

15

Dolphin 2.0 based on mistral-7b released by Eric Hartford (huggingface.co)

submitted 1 year ago* (last edited 1 year ago) by noneabove1182 to c/localllama

1 comments fedilink

Model is trained on his own orca style dataset as well as some airoboros apparently to increase creativity

Quants:

https://huggingface.co/TheBloke/dolphin-2.0-mistral-7B-GPTQ

https://huggingface.co/TheBloke/dolphin-2.0-mistral-7B-GGUF

https://huggingface.co/TheBloke/dolphin-2.0-mistral-7B-AWQ

29

Beginner questions thread (self.localllama)

submitted 1 year ago by noneabove1182 to c/localllama

23 comments fedilink

Trying something new, going to pin this thread as a place for beginners to ask what may or may not be stupid questions, to encourage both the asking and answering.

Depending on activity level I'll either make a new one once in awhile or I'll just leave this one up forever to be a place to learn and ask.

When asking a question, try to make it clear what your current knowledge level is and where you may have gaps, should help people provide more useful concise answers!

13

Microsoft's latest LLM agent: autogen (microsoft.github.io)

submitted 1 year ago by noneabove1182 to c/localllama

6 comments fedilink

AutoGen is a framework that enables development of LLM applications using multiple agents that can converse with each other to solve task. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools.

Git repo here: https://github.com/microsoft/autogen