this post was submitted on 26 Dec 2023

110 points (80.9% liked)

Technology

71623 readers

3542 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

110

Study shows AI image-generators being trained on explicit photos of children (www.independent.co.uk)

submitted 2 years ago by [email protected] to c/[email protected]

26 comments fedilink hide all child comments

Study shows AI image-generators being trained on explicit photos of children::Hidden inside the foundation of popular artificial intelligence image-generators are thousands of images of child sexual abuse, according to a new report that urges companies to take action to address a harmful flaw in the technology they built

all 31 comments

sorted by: hot top controversial new old

[–] [email protected] 94 points 2 years ago (3 children)

Say it with me folks: regulatory capture.

Google et Al wants to make it illegal for the average user to run any kind of ai model locally.. it's scaremongering.

Even if po}pular image generation AIs did have images of children as part of their training dataset how many people would be able to have them generate anything even remotely like that?

Big tech will use this research, and research like it, to de-democratize artificial intelligence and take it out of the hands of ordinary people while actual pedophiles are training image generation AIs on actual child sexual assault media.

Sooner or later someone with a full blown cp generating stable diffusion model will be found and big tech is going to latch on to it like a lamprey on a shark, doing everything in their power to best the world over the head with it and ensure only they have the tools to make aim

[–] [email protected] 1 points 2 years ago* (last edited 2 years ago)

I disagree. There's huge money being pumped into the open source models space. These players are connected and wouldn't bother if the writing was on the board room wall.

Look at who finances "thebloke" on huggingface for example.

[–] [email protected] 1 points 2 years ago* (last edited 1 year ago)

deleted

[+] [email protected] -12 points 2 years ago (1 children)

How? I can just go to civitai and download tons of models, clone the stable diffusion and lora tools, and buy better hardware now, and they'll never be able to do shit about it.

[–] [email protected] 31 points 2 years ago (1 children)

I think he's pointing out that in the future, this could lead to regulatory measures by the government because they get pressured by the big corps that AI locally is dangerous, but AI with big corps is all good and the right way. Which is an understandable concern. It's not about you using whatever model you're using; it's about the broader philosophy of how AI should be integrated into our world. He's saying the big corps are trying to monopolize the AI market, which is valid because that's what's happening right now.

[–] [email protected] 8 points 2 years ago* (last edited 2 years ago)

Yeah. You got it.

And sure you can hop on vocitai and download a model or lora or lycoris or comfyui workflow or whatwver. But we're only at the beginning stages of ai.

Like. Face swapping is mainly done with with the inswapper. Onnx model which was pulled by insight face after it started making the rounds in the face swapping applicarion roop. It's all well and good for hobbyist face swapping image gen but it's a 128 bit model and low res. It kind of makes a blurry mess on larger images. Insight face has higher resolution models available but they're not public, and to my knowledge there aren't any viable alternative to this model that can match the same speed and accuracy. So everyone is out here playing with sticks and rocks while those who can pay have shoyn new things.

There are very valid concerns about the harmful potential of deep fakes and I can understand how the models creator didn't want to take responsibility for enabling that. But if, say, google wanted to use that or a similar closed source in house model to deep fake CASM for propaganda purposes or the same for political leaders, celebrities, not only does the public not have access to those models to understand how it's being done and identify artifacts of that process, they lack the ability to "fight back" in any meaningful way.

To be clear I don't think the above is, or is inevitably going to happen, but it highlights the asymmetric nature of ai that big tech wants. It doesn't even have to be such high stakes. If you wanted to, say, swap out your son's face for Luke Skywalker on the star wars movie for a Christmas present or something, that's something that would be challenging to do locally and convincingly without the right model, but not having access to that model you could instead be forced to pay an absurdly high price by a private company or denied entirely due to fear of copyright infringement, even though I'm relatively certain doing that and not releasing it publicly falls purely in the realm of fair use.

And then there's text, speech, audio generation. What happens if the tech gets good enough for someone to spend a few hours setting up some parameters for some pop songs with vocals, hits go, and generates music as consistently appealing as what we hear on the radio? And when no one else can access that tech? They're able to pay artists nothing and basically produce free content wed have to pay for. If the public had access to that same tech then artists would still have a role in making popular music, even if the landscape had shifted totally. Either way the music business as we know it dies, but there's one option where creative people can still make money independently without getting on big techs dick to do so.

It's a complicated issue and the ethics of it are fraught no matter where you look, but take one look at how cynically terrible all of googles products are getting and I think it's painfully obvious we can't trust them and their ilk with some access to this kind of tech.

[–] [email protected] 38 points 2 years ago

3200 images is 0.001% of the dataset in question, obviously sucked in by mistake. The problematic images ought to be removed from the dataset, but this does not "contaminate" models trained on the dataset in any plausible way.

[–] [email protected] 35 points 2 years ago

Let me guess its a miniscule percent that was included by mistake. And its been used as justification to prevent u running your own model but its safe for large corporations to run them?

[–] [email protected] 24 points 2 years ago (1 children)

All of our protect the children legislation is typically about inhibiting technology that might be used to cause harm and not about assuring children have access to places of safety, adequate food and comfort, time with and access to parents, freedom to live and play.

Y'know, all those things that help make kids resilient to bullies and challenges of growing up. Once again, we leave our kids cold and hungry in poverty while blaming the next new thing for their misery.

So I call shenanigans. Again.

[–] [email protected] 9 points 2 years ago (2 children)

It's still abhorrent, but if AI generated images prevent an actual child from being abused...

It's a nuanced topic for sure.

[–] [email protected] 5 points 2 years ago (1 children)

We need to better understand what causes pedophilic tendencies, so that the environmental, social and genetic factors can someday be removed.

Otherwise children will always be at risk from people who have perverse intentions, whether that person is responsible or not for those intentions.

[–] [email protected] 2 points 2 years ago

I don't think it'll ever be gotten rid of. At it's core, pedophilia is a fetish, not functionally different from being into feet. And like some fetishes, it doesn't mean a person will ever act on it.

I'm sure that many of them hate the fact that they are wired wrong. What really needs to happen is for them to have the ability to seek professional help without worrying about legal repercussions.

[–] [email protected] 6 points 2 years ago (2 children)

Honest question, why is this a problem rather than a solution?

If these kids don't exist, and having those fake pictured make some people content, what's the harm?

Kinda reminds me of furries getting horny over wolf drawings, who cares?

[–] [email protected] 2 points 2 years ago* (last edited 2 years ago)

I agree with you in instances where it's not generating a real person. But there are cases where people use tools like this to generate realistic-looking but fake images of actual, specific real-life children. This is of course abusive to that child. And it's still bad when it's done to adults too, it's sort of a form of defamation.

I really do hope legislation around this issue is narrowly tailored to actual abuse similar to what I described above, but given the "protect the children" nonsense they constantly moan about just about every technology including end to end encryption I'm not very optimistic.

Another thing I wonder about, is if AI could get so realistic that it becomes impossible to prove beyond a reasonable doubt that anyone with actual CSAM (where the image/victim isn't known so they can't prove it that way) is guilty, since any image could plausibly be fake. This of course is an issue far beyond just child abuse. It would probably discredit video footage for robberies and that sort of thing too. We really are venturing into the unknown and government isn't exactly know for adapting to technology...

But I think you're mostly correct, because the moral outrage on social media seems to be about the entire concept of fake sexual depictions of minors existing at all, rather than only about abusive ones

[–] Nommer -4 points 2 years ago (1 children)

Did you just compare furries to pedophiles? One of those is harmless, the other is not.

[–] [email protected] 8 points 2 years ago (1 children)

Ironically, I was giving them as an example of something OK. My point just went over your head.

[–] mindbleach 5 points 2 years ago

False headlines about this took no time at all.

One dataset found suspected images, comprising approximately 0% of examples. Out of a bajillion. And immediately called the cops. But the headline acts like "scientific proof all AIs are fed these images!!!" Which has been the fantasy peddled by people who know less than nothing about this technology, and god fucking dammit, we're gonna be explaining this forever.

An AI does not need pictures of Shrek riding a unicycle, to combine the concept of "Shrek" and "unicycle." Satisfying multiple arbitrary labels is kinda the whole point. The fact it can combine "child" and "porn" is never going to stop being a thing, unless you completely scrub all examples of both those unrelated concepts.

And even that might not work.

[–] [email protected] 5 points 2 years ago

This is the best summary I could come up with:

Hidden inside the foundation of popular artificial intelligence image-generators are thousands of images of child sexual abuse, according to a new report that urges companies to take action to address a harmful flaw in the technology they built.

Those same images have made it easier for AI systems to produce realistic and explicit imagery of fake children as well as transform social media photos of fully clothed real teens into nudes, much to the alarm of schools and law enforcement around the world.

Until recently, anti-abuse researchers thought the only way that some unchecked AI tools produced abusive imagery of children was by essentially combining what they've learned from two separate buckets of online images — adult pornography and benign photos of kids.

It’s not an easy problem to fix, and traces back to many generative AI projects being “effectively rushed to market” and made widely accessible because the field is so competitive, said Stanford Internet Observatory's chief technologist David Thiel, who authored the report.

LAION was the brainchild of a German researcher and teacher, Christoph Schuhmann, who told the AP earlier this year that part of the reason to make such a huge visual database publicly accessible was to ensure that the future of AI development isn't controlled by a handful of powerful companies.

Google built its text-to-image Imagen model based on a LAION dataset but decided against making it public in 2022 after an audit of the database “uncovered a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes.”

The original article contains 1,221 words, the summary contains 256 words. Saved 79%. I'm a bot and I'm open source!

[–] [email protected] 2 points 2 years ago

That is bound to happen if what has been used is images from the open web. What's the news

[–] [email protected] -1 points 2 years ago

AWW HELL NAH

[–] [email protected] -2 points 2 years ago

These kinds of things are inevitable unfortunately. AI is just a tool and it just really depends on how it's used. Just like a gun. I think the biggest issue here is that AI was adopted so quickly. It's been used by tons of companies that actually don't understand what it is and does so it's the wild west right now without any regulations. I'm not entirely sure what kind of regulations you can even put on AI if any.