[-] noneabove1182 6 points 6 months ago

I live in Ontario where we go down to -30C in the harshest conditions.

We have a heat pump and a furnace and they alternate based on efficiency

Somewhere around -5 to +5 C it switches from the heat pump to the furnace

I think you could get by a bit colder but it really loses out on efficiency vs burning gas unless you invest in a geothermal heat pump

10
submitted 8 months ago by noneabove1182 to c/localllama

H200 is up to 1.9x faster than H100. This performance is enabled by H200's larger, faster HBM3e memory.

https://nvidianews.nvidia.com/news/nvidia-supercharges-hopper-the-worlds-leading-ai-computing-platform

9
submitted 8 months ago* (last edited 8 months ago) by noneabove1182 to c/localllama

The creator of ExLlamaV2 (turboderp) has released a lightweight web UI for running exllamav2, it's quite nice! Missing some stuff from text-generation-webui, but makes up for it by being very streamlined and clean

I've made a docker image for it for anyone who may want to try it out, GitHub repo here:

https://github.com/noneabove1182/exui-docker

And for finding models to run with exllamav2 I've been uploading several here:

https://huggingface.co/bartowski

Enjoy!

9
submitted 8 months ago by noneabove1182 to c/localllama

Phind is now using a V7 of their model for their own platform, as they have found that people overall prefer that output vs GPT4. This is extremely impressive because it's not just a random benchmark that can be gamed, but instead crowd sourced opinion on real tasks

The one place everything still lags behind GPT4 is question comprehension, but this is a huge accomplishment

Blog post: https://www.phind.com/blog/phind-model-beats-gpt4-fast

note: they've only open released V2 of their model, hopefully they release newer versions soon.. would love to play with them outside their sandbox

10
submitted 8 months ago by noneabove1182 to c/localllama

Very interesting new sampler, does a better drop at filtering out extremely unlikely tokens when the most likely tokens are less confident, from the results it seems to pretty reliably improve quality with no noticeable downside

35
submitted 8 months ago by noneabove1182 to c/localllama

30T tokens, 20.5T in English, allegedly high quality, can't wait to see people start putting it to use!

Related github: https://github.com/togethercomputer/RedPajama-Data

16
submitted 8 months ago* (last edited 8 months ago) by noneabove1182 to c/localllama

Finally got a nice script going that automates most of the process. Uploads will all be same format, with each bit per weight going into its own branch.

the first two I did don't have great READMEs but the rest will look like this one: https://huggingface.co/bartowski/Mistral-7B-claude-chat-exl2

Also taking recommendations on anything you want to see included in readme or quant levels

14
submitted 8 months ago by noneabove1182 to c/localllama

For anyone who happens to be using my docker images or Dockerfiles for their text-gen-webui, it all started breaking this week when Oobabooga's work was updated to support 12.1

As such, I have updated my docker images and fixed a bunch of issues in the build process. Also been awhile since I posted it here.

You can find all the details here:

https://github.com/noneabove1182/text-generation-webui-docker

It requires driver version 535.113.01

Happy LLMing!

20
submitted 9 months ago by noneabove1182 to c/localllama

From the tweet (minus pictures):

Language models are bad a basic math.

GPT-4 has right around 0% accuracy rate on 5 digit multiplication.

Most open models can't even add. Why is that?

There are a few reasons why numbers are hard. The main one is Tokenization. When training a tokenizer from scratch, you take a large corpus of text and find the minimal byte-pair encoding for a chosen vocabulary size.

This means, however, that numbers will almost certainly not have unique token representations. "21" could be a single token, or ["2", "1"]. 143 could be ["143"] or ["14", "3"] or any other combination.

A potential fix here would be to force single digit tokenization. The state of the art for the last few years is to inject a space between every digit when creating the tokenizer and when running the model. This means 143 would always be tokenized as ["1", "4", "3"].

This helps boost performance, but wastes tokens while not fully fixing the problem.

A cool fix might be xVal! This work by The Polymathic AI Collaboration suggests a generic [NUM] token which is then scaled by the actual value of the number!

If you look at the red lines in the image above, you can get an intuition for how that might work.

It doesn't capture a huge range or high fidelity (e.g., 7.4449 vs 7.4448) but they showcase some pretty convincing results on sequence prediction problems that are primarily numeric.

For example, they want to train a sequence model on GPS conditioned temperature forecasting

They found a ~70x improvement over standard vanilla baselines and a 2x improvement over really strong baselines.

One cool side effect is that deep neural networks might be really good at regression problems using this encoding scheme!

[-] noneabove1182 6 points 9 months ago

Very interesting they wouldn't let him film the camera bump.. it must have some kind of branding on it like Hasselblad? Or maybe they've secretly found a way to have no bump! One can dream..

31
submitted 9 months ago by noneabove1182 to c/[email protected]
15
submitted 9 months ago* (last edited 9 months ago) by noneabove1182 to c/localllama

Model is trained on his own orca style dataset as well as some airoboros apparently to increase creativity

Quants:

https://huggingface.co/TheBloke/dolphin-2.0-mistral-7B-GPTQ

https://huggingface.co/TheBloke/dolphin-2.0-mistral-7B-GGUF

https://huggingface.co/TheBloke/dolphin-2.0-mistral-7B-AWQ

23
Beginner questions thread (self.localllama)
submitted 9 months ago by noneabove1182 to c/localllama

Trying something new, going to pin this thread as a place for beginners to ask what may or may not be stupid questions, to encourage both the asking and answering.

Depending on activity level I'll either make a new one once in awhile or I'll just leave this one up forever to be a place to learn and ask.

When asking a question, try to make it clear what your current knowledge level is and where you may have gaps, should help people provide more useful concise answers!

13
submitted 9 months ago by noneabove1182 to c/localllama

AutoGen is a framework that enables development of LLM applications using multiple agents that can converse with each other to solve task. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools.

Git repo here: https://github.com/microsoft/autogen

[-] noneabove1182 5 points 9 months ago

To start, everything you're saying is entirely correct

However, the existence of emergent behaviours like chain of thought reasoning shows that there's more to this than pure text predictions, it picks up patterns that were never explicitly trained, so it's entirely feasible to ponder if they're able to recognize reverse patterns

Hallucinations are a vital part of understanding the models, they might not be long term problems but getting them to understand what they actually know to be true is extremely important in the growth and adoption of LLMs

I think there's a lot more to the training and generation of text than you're giving it credit, the simplest way to explain it is that it's text prediction, but there's way too much depth to the training and model to say that's all it is

At the end of the day it's just a fun thought inducing post :) but when Andrej karparthy says he doesn't have a great intuition on how LLM knowledge works (though in fairness he theorizes the same as you, directional learning) I think we can at least agree none of us know for sure what is correct!

[-] noneabove1182 6 points 11 months ago

Hey thanks for the detailed writeup, this is great! Probably worth including a couple of the llama 1 models just because they're more mature and ready to be used even tho licensing is awkward

Also if you'd like I maintain a few docker images for a couple tools (namely oobabooga, koboldcpp, and lollms-webui) that might be good for beginners to get their feet wet, can find them pinned at https://github.com/noneabove1182

[-] noneabove1182 5 points 1 year ago

Security patches are out monthly less than a week after Google releases, OS updated are slower but have been getting better, only major downside is the lack of commitment to more than 2 years of OS, real kick in the shins for such an expensive phone but alas I'm a sucker for all it's other offerings

[-] noneabove1182 6 points 1 year ago

I'm devoted to Sony, can't switch to anything else.. they're the only one watching all the features I care about

[-] noneabove1182 7 points 1 year ago

Honestly an interesting thought and worth keeping in mind, I would love to see a lot more examples and more timing, especially for the pythonic ones, are they more efficient or just more python like?

[-] noneabove1182 7 points 1 year ago

For me it's best for the apps where people don't upload to Fdroid but I trust them

[-] noneabove1182 6 points 1 year ago

This mildly surprised me, doesn't seem explicit enough, a thumbs up can represent having received but not necessarily agreed, strange new world

[-] noneabove1182 5 points 1 year ago

That said, when I plug my Sony into a dock with display port, it does forward my display.. so is this actually new or new pixel?

[-] noneabove1182 5 points 1 year ago

I had a similar curiousity... Like if I make my own instance but it's just myself, is that even a net positive to the network? Now there's a new instance pulling everything I want to it, rather than another bigger instance that might have used that share subscriptions..

[-] noneabove1182 7 points 1 year ago

Can you create a dashboard of sorts so we can all see the CPU RAM and storage usage? Would be highly interested. Also if you're accepting donations of storage I have some spare drives ๐Ÿ˜…

view more: โ€น prev next โ€บ

noneabove1182

joined 1 year ago
MODERATOR OF