LocalLLaMA

2847 readers

1 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

founded 2 years ago

MODERATORS

MonsterBug

Smokeydope@lemmy.world

SkySyrup

pax

noneabove1182

Meet ‘Smaug-72B’: The new king of open-source AI (venturebeat.com)

submitted 1 year ago by ylai@lemmy.ml to c/localllama

5 comments fedilink hide all child comments

top 5 comments

sorted by: hot top controversial new old

[–] noneabove1182 15 points 1 year ago

Stop making me want to buy more graphics cards...

Seriously though this is an impressive result, "beating" gpt3.5 is a huge milestone and I love that we're continuing the trend. Will need to try out a quant of this to see how it does in real world usage. Hope it gets added to the lmsys arena!

[–] mixtral 1 points 1 year ago (1 children)

Open training code too?

[–] h3ndrik@feddit.de 2 points 1 year ago* (last edited 1 year ago) (1 children)

I don't get your question. I think their contribution isn't training a model from zero, but a new DPO loss function for fine-tuning. You can read about that in their paper. It is open-access. The model itself is a fine-tune of MoMo-72B-lora-1.8.7-DPO which is based on Qwen-72B. Respective models have their own papers and Github repos. If your question is about the dataset, that is answered in Appendix D of the paper.

https://github.com/abacusai/smaug

(This is the repo they link with the statement "We release our code and pretrained models [...]". I can't find a ready-made Python script there (yet). But their method and contribution to DPO seem to be described in the paper. Everything looks pretty open to me. They even described their dataset. But it's a scientific paper with a small improvement to fine-tuning, accompanied with a model to show off the statistics... Not a software release.)

[–] llm 2 points 1 year ago (1 children)

It is awesome to have such models opensourced and competed with chatgpt4 but main feature why people still like closed source chatgpt is access to internet for such models. Is there any model have it now?

[–] h3ndrik@feddit.de 2 points 1 year ago* (last edited 1 year ago)

Sure. I think what you're looking for are "AI agents" or "RAG" (Retrieval Augmented Generation). It's not the model itself that does it, but the framework and software around it that provides the internet search capabilities.

And it's not unique to ChatGPT. It's been available also to open-weight / local models for years. I've lost track of all of the frameworks we have and what features they have. So I don't know what to recommend. But you can look it up, there are several frameworks available that provide such capabilities to any model. Maybe your preferred solution even has a plugin available.

The underlying method is probably the same for how ChatGPT works internally, and all the other ones. And it's related to how companies have always fed internal data to their chatbots since all of this started.

You can try one example on hf.co/chat Another solution would be something like h2ogpt. It can do web-search and also index all of the PDFs on your harddrive and answer questions about them and do vision tasks, look at pictures or generate them.