this post was submitted on 22 Nov 2024
749 points (98.1% liked)

Comic Strips

12758 readers
3969 users here now

Comic Strips is a community for those who love comic stories.

The rules are simple:

Web of links

founded 1 year ago
MODERATORS
749
submitted 5 days ago* (last edited 5 days ago) by Joker to c/[email protected]
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 1 day ago* (last edited 1 day ago)

How useful would the training data be

Open datasets are getting much better (Tulu for an instruct database/recipe is a great example), but its clear the giants still have "secret sauce" that gives them at least a small edge over open datasets.

There actually seems to be some vindication of using massively multilingual datasets as well, as the hybrid chinese/english models are turning out very good.