noneabove1182

joined 2 years ago
MODERATOR OF
[–] noneabove1182 1 points 1 year ago

The significance is we have a new file format standard, bad news is it breaks compatibility with the old format so you'll have to update to use newer quants and you can't use your old ones

The good news is this is the last time that'll happen (it's happened a few times so far) as this one is meant to be a lot more extensible and flexible, storing a ton of extra metadata for extra compatibility

The great news is that this paves the way for better model support as we've seen already with support for falcon being merged: https://github.com/ggerganov/llama.cpp/commit/cf658adc832badaaa2ca119fe86070e5a830f8f6

[–] noneabove1182 3 points 1 year ago

I hate when they do that so much too lol

[–] noneabove1182 1 points 1 year ago* (last edited 1 year ago) (1 children)

Thanks for the comment! Yes this is meant more for your personal projects than for using in existing projects

The idea behind needing a password to get a password, totally understand, my main goal was to have local encrypted storage, the nice thing about this implementation is that you can have all your env files saved and shared in your git repo for all devs to have access to, but only can decrypt it if given the master password shared elsewhere (keeper, vault etc) so you don't have to load all values from a vault, just the master

100% though this doesn't cover a large range of usage, hence the name "simple" haha, wouldn't be opposed to expanding but I think it covers my proposed use cases as-is

[–] noneabove1182 1 points 1 year ago

Sure it's a simplistic view, I meant it more that you can guide it towards completing a sentence, but you're right that it's worth recognizing what's actually going on!

[–] noneabove1182 1 points 1 year ago (1 children)

That is interesting though how you interpreted the question, I think the principle of "rate limiting" is playing in my favour here where typically when you rate limit something you don't throw it into a queue, you deny it and wait for the next request (think APIs)

[–] noneabove1182 2 points 1 year ago (2 children)

Your best bet is likely going to be editing the original prompt to add information until you get the right output, however, you can also get clever with it and add to the response of the model itself. Remember, all it's doing is filling in the most likely next word, so you could just add extra text at the end that says "now, to implement it in X way" or "I noticed I made a mistake in Y, to fix that " and then hit generate and let it continue the sentence

[–] noneabove1182 1 points 1 year ago

definitely for sure this time we promise

[–] noneabove1182 2 points 1 year ago

link is broken

but content in the title is enough, just sad especially as an owner of a TicWatch Pro 3 Ultra.. been gathering dust in my drawer waiting for WearOS 3..

[–] noneabove1182 3 points 1 year ago (2 children)

cries in to watch pro 3 ultra

[–] noneabove1182 4 points 1 year ago

still so sad about the death of blobbies :'(

[–] noneabove1182 2 points 1 year ago

oh yeah definitely didn't mean "no more breaking changes", just that we've had several from ggml file format changes, and so THAT portion of the breaking is going away

[–] noneabove1182 3 points 1 year ago (2 children)

it's a standardizing of a universal GGML format which would mean going forward no more breaking changes when new formats are worked on, and also includes the same functionality of llama.cpp for all GGML types (falcon, mpt, starcoder etc)

 

Open Orca preview trained on ~6% of data:

We have trained on less than 6% of our data, just to give a preview of what is possible while we further refine our dataset! We trained a refined selection of 200k GPT-4 entries from OpenOrca. We have filtered our GPT-4 augmentations to remove statements like, "As an AI language model..." and other responses which have been shown to harm model reasoning capabilities. Further details on our dataset curation practices will be forthcoming with our full model releases.

 
 

https://github.com/vllm-project/vllm

vLLM is a fast and easy-to-use library for LLM inference and serving.

vLLM is fast with:

State-of-the-art serving throughput Efficient management of attention key and value memory with PagedAttention Continuous batching of incoming requests Optimized CUDA kernels vLLM is flexible and easy to use with:

Seamless integration with popular HuggingFace models High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more Tensor parallelism support for distributed inference Streaming outputs OpenAI-compatible API server

YouTube video describing it: https://youtu.be/1RxOYLa69Vw

 

𝗥𝗲𝗮𝗿: • 50MP (Sony IMX890) (f/1.9) (1/1.56") (OIS & EIS) Focal length: 24mm

• 50MP (Samsung JN1) (f/2.2) (1/2.7") (EIS) (FoV: 115°) Macro (4cm)

𝗦𝗲𝗹𝗳𝗶𝗲: 32MP (Sony IMX615) (f/2.4) (EIS)

 

Nothing Phone 2 Camera specs

𝗥𝗲𝗮𝗿: • 50MP (Sony IMX890) (f/1.9) (1/1.56") (OIS & EIS) Focal length: 24mm

• 50MP (Samsung JN1) (f/2.2) (1/2.7") (EIS) (FoV: 115°) Macro (4cm)

𝗦𝗲𝗹𝗳𝗶𝗲: 32MP (Sony IMX615) (f/2.4) (EIS)

 

I realized that while Microsoft would probably release their LLaMA-13b based model (as of the time of this writing they still haven't) I concluded that they might not release the dataset. Therefore, I resolved to replicate their efforts, download the data myself, and train the model myself, so that OpenOrca can be released on other sizes of LLaMA as well as other foundational models such as Falcon, OpenLLaMA, RedPajama, MPT, RWKV.

 

Koboldcpp 1.33 was released, and with it means new docker images :) anything with -gpu works with cublas now!

Released my updates for koboldcpp docker images for v1.33 (CUDA support!):

https://hub.docker.com/u/noneabove1182

there's also a new koboldcpp-gpu-test where i'm trying to reduce the image size, got it down to less than half of the original -gpu (1.58GB vs 3.87GB), everything seems to be working but if anyone else is willing to help validate that would be much appreciated

make sure if you're upgrading you clear out your docker volume, it does weird things during upgrades...

 

Exciting moving forward, hopefully it leads to display port being standard on Android

16
submitted 2 years ago* (last edited 2 years ago) by noneabove1182 to c/[email protected]
view more: ‹ prev next ›