this post was submitted on 29 Jan 2025
18 points (95.0% liked)

Investing

824 readers
26 users here now

A community for discussing investing news.

Rules:

  1. No bigotry: Including racism, sexism, homophobia, transphobia, or xenophobia. Code of Conduct.
  2. Be respectful. Everyone should feel welcome here.
  3. No NSFW content.
  4. No Ads / Spamming.
  5. Be thoughtful and helpful: even with ‘stupid’ questions. The world won’t be made better or worse by snarky comments schooling naive newcomers on Lemmy.

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] lurch 1 points 1 day ago (1 children)

a list with references to the training data plus what they added would be the bare minimum to call it open source, in my opinion, but a lot of people see this more strict than i do.

[–] [email protected] 2 points 1 day ago (1 children)

None of the flagship models publish their training data because they're all trained on less-than-legal datasets.

It's a little like complaining that jellyfin doesn't publish any media with their code - not only is that not legal but it's implied that you're responsible for attaining your own.

If you're someone who can and does compile and re-train your own 64B parameter LLM models, you almost certainly have your own dataset for that purpose (in fact huggingface has many).

[–] lurch 1 points 12 hours ago

still doesn't make it magically open source.

debian would probably split the package in a non-free and open source part, for this reason.