this post was submitted on 02 Aug 2023
14 points (93.8% liked)
LocalLLaMA
2328 readers
28 users here now
Community to discuss about LLaMA, the large language model created by Meta AI.
This is intended to be a replacement for r/LocalLLaMA on Reddit.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Sorry, didn't find it. If i remember correctly it was either for using models where the foundation model was trained to fewer (2048?) tokens. Or for the measurement/benchmark being too 'synthetic' / not meaningful for real-world scenarios or something.
I read this: https://www.reddit.com/r/LocalLLaMA/comments/155vy0k/llama_2_too_repetitive/ (And maybe also related to this topic: https://arize.com/blog/lost-in-the-middle-how-language-models-use-long-contexts-paper-reading/ and https://github.com/THUDM/LongBench )
Also: I've played around a bit with llama. I haven't had good results with summarizing things whatsoever. Maybe it's not the context length, but the wrong model for the task? Aren't there other language models out there, specifically suited for the task of summarization? Llama is kind of generalist and maybe just not exceptionally good at this specific task.
https://huggingface.co/learn/nlp-course/chapter7/5?fw=tf#models-for-text-summarization and https://www.width.ai/post/bart-text-summarization
Regarding the original question: I'm not sure whether KoboldCPP does it correctly for the newer 4k context length. For me it says
Using automatic RoPE scaling (scale:1.000, base:32000.0)
But is that the correct base value? That's the same as if i were using an LLaMA1 model with artificially increased context length.You are supposed to manually set scale to 1.0 and base to 10000 when using llama 2 with 4096 context. The automatic scaling assumes the model was trained for 2048. Though as I say in the OP, that still doesn't work, at least with this particular fine tune.