By far the biggest pain point of Sony.. their software is clean stable and fast, with acceptable release cadence, but their promise of 2 years is completely unacceptable in this day
Wish there was any way at all to influence them
By far the biggest pain point of Sony.. their software is clean stable and fast, with acceptable release cadence, but their promise of 2 years is completely unacceptable in this day
Wish there was any way at all to influence them
I wrote that summary, maybe would help if I knew your knowledge level? Which parts didn't make sense?
There's plenty of smaller projects around that attempt to solve similar problems, metagpt, agent os, gpt-pilot, gpt-engineer, autochain, etc
Several would I'm sure love a hand , you should check em out on GitHub!
It seems reasonably realistic if you compare it to code interpreter, it was able to recognize packages it hadn't installed and go seek them out, I don't think it's outside the scope for it to recognize which module wasn't installed and install it
Even now with regular models they'll suggest the packages you install before executing the code they provide
Sure, I can try to add a couple lines on top of the abstract just to give a super brief synopsis
In this case it would be something like:
This paper discusses a new technique in which we can create a LORA for an already quantized model, this is unique from QLora which quantizes the full model on the fly to create a quantized lora. With this approach you can take your small model and work with it as is, saving a ton of resources and speeding up the process massively
Not a glowing review that this is accidentally not a reply to a comment. :p
This is great and comes with a very interesting model!
I wonder if they cleverly slide the window in any way or if it's just a naive slide, could probably be pretty smart if you discard tokens that have minimal attention on them anyways to focus on important text
For now, this is awesome!
The good news is if you do it wrong, much like regular speculative generation, you will still get the right result that the full model would output at the end, so there won't be any loss in quality, just loss in speed
It's definitely a good point tho, finding the optimal configuration is the difference between slower/minimal speedup and potentially huge amounts of speedup
Somehow this is even more confusing because that code hasn't been touched in 3 months, maybe just took them that long to validate? Will have to read through it, thanks!
Yeah fair point, I'll make sure to include better links in the future :) typically post from mobile so it's annoying but doable
The abstract is meant to pull in random readers, so it's understandable they'd lay a bit of foundation about what the paper will be about, even if it seems rather simple and unnecessarily wordy
LoRA is still considered to be the gold standard in efficient fine tuning, so that's why a lot of comparisons are made to it instead of QLoRA, which is more of a hacky way. They both have their advantages, but are pretty distinct.
Another thing worth pointing out is that 4-bit is not actually just converting all 16bit weights into 4 bits (at least, not in GPTQ style) They also save a quantization factor, so there's more information that can be retrieved from the final quantization than just "multiple everything by 4"
QA LoRA vs QLoRA: I think my distinction is the same as what you said, it's just about the starting and ending state. QLoRA though also introduced a lot of other different techniques, like double quantizations, normal float datatypes, and paged optimizations to make it work
it's also worth point out, not understanding it has nothing to do with intellect, it's just how much foundational knowledge you have, i don't understand most of the math but i've read enough of the papers to understand to some degree what's going on
The one thing I can't quite figure out is, I know QLoRA is competitive with a LoRA because it trains more layers of the transformer vs a LoRA, but I don't see any specific mention of QA-LoRA following that same method which I would think is needed to maintain the quality
Overall you're right though, this paper is a bit on the weaker side, that said if it works then it works and it's a pretty decent discovery, but the paper alone does not guarantee that