this post was submitted on 21 Aug 2024
45 points (100.0% liked)

Cybersecurity

5754 readers
186 users here now

c/cybersecurity is a community centered on the cybersecurity and information security profession. You can come here to discuss news, post something interesting, or just chat with others.

THE RULES

Instance Rules

Community Rules

If you ask someone to hack your "friends" socials you're just going to get banned so don't do that.

Learn about hacking

Hack the Box

Try Hack Me

Pico Capture the flag

Other security-related communities [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]

Notable mention to [email protected]

founded 1 year ago
MODERATORS
 

Whack yakety-yak app chaps rapped for security crack

you are viewing a single comment's thread
view the rest of the comments
[โ€“] kinkles 7 points 3 months ago (1 children)

Is it possible to implement a perfect guardrail on an AI model such that it will never ever spit out a certain piece of information? I feel like these models are so complex that you can always eventually find the perfect combination of words to circumvent any attempts to prevent prompt injection.