this post was submitted on 11 Feb 2024

643 points (97.9% liked)

Technology

70249 readers

3469 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

643

The White House wants to 'cryptographically verify' videos of Joe Biden so viewers don't mistake them for AI deepfakes (www.businessinsider.com)

submitted 1 year ago by [email protected] to c/[email protected]

260 comments fedilink hide all child comments

The White House wants to 'cryptographically verify' videos of Joe Biden so viewers don't mistake them for AI deepfakes::Biden's AI advisor Ben Buchanan said a method of clearly verifying White House releases is "in the works."

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 1 points 1 year ago (1 children)

Apple's scrapped on-device CSAM scanning was based on perceptual hashes.

The first collision demo breaking them showed up in hours with images that looked glitched. After just a week the newest demos produced flawless images with collisions against known perceptual hash values.

In theory you could create some ML-ish compact learning algorithm and use the compressed model as a perceptual hash, but I'm not convinced this can be secure enough unless it's allowed to be large enough, as in some % of the original's file size.

[–] [email protected] 1 points 1 year ago (1 children)

you can definitely produced perceptual hashes that collide, but really you’re not just talking about a collision, you’re talking about a collision that’s also useful in subverting an election, AND that’s been generated using ML which is something that’s still kinda shakey to start with

[–] [email protected] 1 points 1 year ago (1 children)

Perceptual hash collision generators can take arbitrary images and tweak them in invisible ways to make them collide with whichever hash value you want.

[–] [email protected] 1 points 1 year ago (1 children)

from the comment above, it seems like it took a week for a single image/frame though… it’s possible sure but so is a collision in a regular hash function… at some point it just becomes too expensive to be worth it, AND the phash here isn’t being used as security because the security is that the original was posted on some source of truth site (eg the whitehouse)

[–] [email protected] 1 points 1 year ago (1 children)

No, it took a week to refine the attack algorithm, the collision generation itself is fast

The point of perceptual hashes is to let you check if two things are similar enough after transformations like scaling and reencoding, so you can't rely on that here

[–] [email protected] 1 points 1 year ago (1 children)

oh yup that’s a very fair point then! you certainly wouldn’t use it for security in that case, however there are a lot of ways to implement this that don’t rely on the security of the hash function, but just uses it (for example) to point to somewhere in a trusted source to manually validate that they’re the same

we already have the trust frameworks; that’s unnecessary… we just need to automatically validate (or at least provide automatic verifyability) that a video posted on some 3rd party - probably friendly or at least cooperative - platform represents reality

[–] [email protected] 1 points 1 year ago

I think the best bet is really video formats with multiple embedded streams carrying complementary frame data (already exists) so you decide video quality based on how many streams you want to merge in playback.

If you then hashed the streams independently and signed the list of hashes, then you have a video file which can be "compressed" without breaking the signature by stripping out some streams.