Yeah so those are mixed, definitely not putting each individual weight to 2 bits because as you said that's very small, i don't even think it averages out to 2 bits but more like 2.56
You can read some details here on bits per weight: https://huggingface.co/TheBloke/LLaMa-30B-GGML/blob/8c7fb5fb46c53d98ee377f841419f1033a32301d/README.md#explanation-of-the-new-k-quant-methods
Unfortunately this is not the whole story either, as they get further combined with other bits per weight, like q2_k is Q4_K for some of the weights and Q2_K for others, resulting in more like 2.8 bits per weight
Generally speaking you'll want to use Q4_K_M unless going smaller really benefits you (like you can fit the full thing on GPU)
Also, the bigger the model you have (70B vs 7B) the lower you can go on quantization bits before it degrades to complete garbage
I use text-generation-webui mostly. If you're only using GGUF files (llama.cpp), koboldcpp is a really good option
A lot of it is the automatic prompt formatting, there's probably like 5-10 specific formats that are used, and using the right one for your model is very important to achieve optimal output. TheBloke usually lists the prompt format in his model card which is handy
Rope and yarn refer to extending the default context of a model through hacky (but functional) methods and probably deserve their own write up