this post was submitted on 24 Nov 2024
17 points (94.7% liked)

LocalLLaMA

2852 readers
20 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

founded 2 years ago
MODERATORS
17
Qwen2.5-Coder-7B (self.localllama)
submitted 4 months ago by lynx to c/localllama
 

I've been using Qwen 2.5 Coder (bartowski/Qwen2.5.1-Coder-7B-Instruct-GGUF) for some time now, and it has shown significant improvements compared to previous open weights models.

Notably, this is the first model that can be used with Aider. Moreover, Qwen 2.5 Coder has made notable strides in editing files without requiring frequent retries to generate in the proper format.

One area where most models struggle, including this one, is when the prompt exceeds a certain length. In this case, it appears that the model becomes unable to remember the system prompt when the prompt length is above ~2000 tokens.

top 3 comments
sorted by: hot top controversial new old
[–] [email protected] 3 points 4 months ago (1 children)

Which backend are you using to run it, and does that backend have an option to adjust context size?

I noticed in LM Studio, for example, that the default context size is much smaller than the maximum that the model supports. Qwen should certainly support more than 2000 tokens. I'd try setting it to 32k if you can.

[–] lynx 3 points 4 months ago (1 children)

I have found the problem with the cut off, by default aider only sends 2048 tokens to ollama, this is why i have not noticed it anywhere else except for coding.

When running /tokens in aider:

$ 0.0000   16,836 tokens total
           15,932 tokens remaining in context window
           32,768 tokens max context window size

Even though it will only send 2048 tokens to ollama.

To fix it i needed to add a file .aider.model.settings.yml to the repository:

- name: aider/extra_params
  extra_params:
    num_ctx: 32768
[–] [email protected] 1 points 4 months ago

That's because ollama's default max ctx is 2048, as far as I know.