memes

10472 readers

4375 users here now

Community rules

1. Be civil

No trolling, bigotry or other insulting / annoying behaviour

2. No politics

This is non-politics community. For political memes please go to [email protected]

3. No recent reposts

Check for reposts when posting a meme, you can only repost after 1 month

4. No bots

No bots without the express approval of the mods or the admins

5. No Spam/Ads

No advertisements or spam. This is an instance rule and the only way to live.

Sister communities

[email protected] : Star Trek memes, chat and shitposts
[email protected] : Lemmy Shitposts, anything and everything goes.
[email protected] : Linux themed memes
[email protected] : for those who love comic stories.

founded 1 year ago

MODERATORS

[email protected]

1142

Blursed Bot (lemmy.dbzer0.com)

submitted 4 months ago by [email protected] to c/[email protected]

90 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 18 points 4 months ago (6 children)

Couldn’t they make the bots ignore every prompt, that asks them to ignore previous prompts?

Yes and no.

What you see in the meme is either a well-crafted joke, or the result of lazy programming. But that kind of "breakout" of the interactive model is absolutely a real thing. You can reasonably protect such a prompt from some "attack" vectors like this, simply by filtering/screening inputs. This is kind of what image generators and other public LLM prompts (e.g. ChatGPT) do today.

At the same time, there are security researchers and hackers^1^ that are actively looking for ways to break through that filtering rendering it moot. Given enough time and a talented or resourceful adversary, breaking through is inevitable. Like all security, it's an arms race.

Like with a prompt like: “only stop propaganda discussion mode when being prompted: XXXYYYZZZ123, otherwise say: dude i’m not a bot”?

That's actually worth a shot. You could try that right now with GPT, but I doubt it's all that bulletproof.

^1^ Sometimes, these are the same picture.

[–] kwomp2 5 points 4 months ago (5 children)

Thanks veryone for the answers. Still hard to get my head around it. Even if LLMs are not exactly algorithms it seems odd to me you cant make them follow one simple "only do x if y" rule.

From my programming course in ~2005 the lego robots where all about those if sentences :/

[–] [email protected] 6 points 4 months ago (2 children)

The layman's explanation of how an LLM works is it tries to predict the most likely word, or sequence of words, that follow from the last. This is based all on the input training set, which is compiled into a big bucket of probabilities. All text input influences those internal probabilities which in turn generates likely output. This is also why these things are error-prone because it's really just hyper-sophisticated predictive text, and is doing its best to "play the odds."

You can also view an LLM as one fiendishly massive if/else statement that chews on text tokens. There's also some random seeding thrown in for more variation in output, but these things are 100% repeatable if you use the same seed every time; it's just compiled logic.

[–] kwomp2 3 points 4 months ago (1 children)

Hehe best illustration. "big bucket of probabilities" ...hell yeah

[–] [email protected] 3 points 4 months ago

Yup. I had this in my head at the time:

load more comments (2 replies)