this post was submitted on 29 Jan 2025
18 points (95.0% liked)
Investing
824 readers
26 users here now
A community for discussing investing news.
Rules:
- No bigotry: Including racism, sexism, homophobia, transphobia, or xenophobia. Code of Conduct.
- Be respectful. Everyone should feel welcome here.
- No NSFW content.
- No Ads / Spamming.
- Be thoughtful and helpful: even with ‘stupid’ questions. The world won’t be made better or worse by snarky comments schooling naive newcomers on Lemmy.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
a list with references to the training data plus what they added would be the bare minimum to call it open source, in my opinion, but a lot of people see this more strict than i do.
None of the flagship models publish their training data because they're all trained on less-than-legal datasets.
It's a little like complaining that jellyfin doesn't publish any media with their code - not only is that not legal but it's implied that you're responsible for attaining your own.
If you're someone who can and does compile and re-train your own 64B parameter LLM models, you almost certainly have your own dataset for that purpose (in fact huggingface has many).
still doesn't make it magically open source.
debian would probably split the package in a non-free and open source part, for this reason.