this post was submitted on 02 Oct 2023
34 points (97.2% liked)
LocalLLaMA
2846 readers
3 users here now
Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.
Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.
As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I have two 3090 Turbo GPUs and it seems like oobabooga doesn't split the load between the two cards when I try to run TheBloke/dolphin-2.7-mixtral-8x7b-AWQ.
Does anyone know how to make text generation webui use both cards? Do I need an nvlink between the two cards?
You shouldn't need nvlink, I'm wondering if it's something to do with AWQ since I know that exllamav2 and llama.cpp both support splitting in oobabooga
I think you're right. Saw a post on Reddit basically mentioning the same things I'm seeing.
It looks like autoawq supports it but it might be an issue with how oobabooga implements it or something...