Technology
This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.
Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.
Rules:
1: All Lemmy rules apply
2: Do not post low effort posts
3: NEVER post naziped*gore stuff
4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.
5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)
6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist
7: crypto related posts, unless essential, are disallowed
view the rest of the comments
Typical trope while promoting a "new" technology. A classic example is 1972's AARON https://en.wikipedia.org/wiki/AARON which, despite not being based on LLM (so not CLIP) nor even ML is still creating novel images. So... image generation has been existing since at least the 70s, more than half a century ago. I'm not saying it's equivalent to the implementation since DALLE (it is not) but to somehow ignore the history of a research field is not doing it justice. I have also been modding https://old.reddit.com/r/computationalcrea/ since 9 years, so that's before OpenAI was even founded, just to give some historical context. Also 2015 means 6 years before CLIP. Again, not to say this is the equivalent, solely that generative AI has a long history and thus setting back dates to grand moments like AlphaGo or DeepBlue (and on this topic I can recommend Rematch from Arte) ... are very much arbitrary and in no way help to predict what's yet to come, both in terms of what's achievable but even the pace.
Anyway, I don't know what you actually tried but here is a short list of the 58 (as of today) models I tried https://fabien.benetou.fr/Content/SelfHostingArtificialIntelligence and that's excluding the popular ones, e.g. ChatGPT, Mistal LeChat, DALLE, etc which I also tried.
I might be making "the same mistake" but, as I hope you can see, I do keep on trying what I believe is the state of the art of a pretty much weekly basis.
Creating abstract art by moving pixels around is not anywhere close to what we mean by image generation. At no point did this other software generate something from a prompt
I'd normally accept the challenge if you didn't add that. You did though and it, namely a system (arguably intelligent) made an image, several images in fact. The fact that we dislike or like the aesthetics of it or that the way it was done (without prompt) is different than how it currently is remains irrelevant according to your own criteria, which is none. Anyway my point with AARON isn't about this piece of work specifically, rather that there is prior work, and this one is JUST an example. Consequently the starting point is wrong.
Anyway... even if you did question this, I argued for more, showing that I did try numerous (more than 50) models, including very current ones. It even makes me curious if you, who is arguing for the capabilities and their progress, if you tried more models than I did and if so where can I read about it and what you learned about such attempts.
It's irrelevant because it wasn't a precursor technique. The precursor was machine learning research, not other image generation technology
So LLMs can trace their origin back to the 2017 paper "Attention is all you need", they with diffusion models have enabled prompt based image generation at an impressive quality.
However, looking at just image generation you have GANs as far back as 2014 with style GANs (ones that you could more easily influence) dating back to 2018. While diffusion models also date back to 2015, I don't see any mention of use in images until early 2020's.
Thats also ignoring that all of these technologies go back further to lstms and CNNs, which go back further into other NLP/CV technologies. So there has been a lot of progress here, but progress isn't also always linear.
You can see with image generation progress was extremely quick