Funny: Home of the Haha

6893 readers

291 users here now

Welcome to /c/funny, a place for all your humorous and amusing content.

Looking for mods! Send an application to Stamets!

Our Rules:

Keep it civil. We're all people here. Be respectful to one another.
No sexism, racism, homophobia, transphobia or any other flavor of bigotry. I should not need to explain this one.
Try not to repost anything posted within the past month. Beyond that, go for it. Not everyone is on every site all the time.

Other Communities:

/c/[email protected] - Star Trek chat, memes and shitposts
/c/[email protected] - General memes

founded 2 years ago

MODERATORS

[email protected]

995

Good effort (startrek.website)

submitted 1 year ago by [email protected] to c/[email protected]

124 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 30 points 1 year ago* (last edited 1 year ago) (2 children)

AI / LLM only tries to predict the next word or token

This is not wrong, but also absolutely irrelevant here. You can be against AI, but please make the argument based on facts, not by parroting some distantly related talking points.

Current image generation is powered by diffusion models. Their inner workings are completely different from large language models. The part failing here in particular is the text encoder (clip). If you learn how it works and think about it you'll be able to deduce how the image generator is forced to draw this image.

Edit: because it's an obvious limitation, negative prompts have existed pretty much since diffusion models came out

[+] [email protected] 5 points 1 year ago* (last edited 1 year ago) (1 children)

[removed by mod]

[–] [email protected] 5 points 1 year ago (1 children)

No, it does not. At least not in the same way that generative pre-trained transformers do. It is handling natural language though.

The research is all open source if you want details. For Stable Diffusion you'll find plenty of pretty graphs that show how the different parts interact.

[–] [email protected] 3 points 1 year ago (1 children)

All these examples are not just using stable diffusion though. They are using an LLM to create a generative image prompt for DALL-E / SD, which then gets executed. In none of these examples are we shown the actual prompt.

If you instead instruct the LLM to first show the text prompt, review it and make sure the prompt does not include any elephants, revise it if necessary, then generate the image, you’ll get much better results. Now, ChatGPT is horrible in following instructions like these if you don’t set up the prompt very specifically, but it will still follow more of the instructions internally.

Anyway, the issue in all the examples above does not stem from stable diffusion, but from the LLM generating an ineffective prompt to the stable diffusion algorithm by attempting to include some simple negative word for elephants, which does not work well.

[–] [email protected] 2 points 1 year ago

If you prompt stable Diffusion for "a room without elephants in it" you'll get elephants. You need to add elephants to the negative prompt to get a room without them. I don't think LLMs have been given the ability to add negative prompts