this post was submitted on 07 Oct 2023
992 points (97.7% liked)

Technology

57472 readers
3757 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Previous posts: https://programming.dev/post/3974121 and https://programming.dev/post/3974080

Original survey link: https://forms.gle/7Bu3Tyi5fufmY8Vc8

Thanks for all the answers, here are the results for the survey in case you were wondering how you did!

Edit: People working in CS or a related field have a 9.59 avg score while the people that aren’t have a 9.61 avg.

People that have used AI image generators before got a 9.70 avg, while people that haven’t have a 9.39 avg score.

Edit 2: The data has slightly changed! Over 1,000 people have submitted results since posting this image, check the dataset to see live results. Be aware that many people saw the image and comments before submitting, so they've gotten spoiled on some results, which may be leading to a higher average recently: https://docs.google.com/spreadsheets/d/1MkuZG2MiGj-77PGkuCAM3Btb1_Lb4TFEx8tTZKiOoYI

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 153 points 10 months ago (5 children)

So if the average is roughly 10/20, that's about the same as responding randomly each time, does that mean humans are completely unable to distinguish AI images?

[–] [email protected] 88 points 10 months ago (2 children)

In theory, yes. In practice, not necessarily.

I found that the images were not very representative of typical AI art styles I've seen in the wild. So not only would that render preexisting learned queues incorrect, it could actually turn them into obstacles to guessing correctly pushing the score down lower than random guessing (especially if the images in this test are not randomly chosen, but are instead actively chosen to dissimulate typical AI images).

[–] [email protected] 42 points 10 months ago (1 children)

I would also think it depends on what kinds of art you are familiar with. If you don’t know what normal pencil art looks like, how are ya supposed to recognize the AI version.

As an example, when I’m browsing certain, ah, nsfw art, I can recognize the AI ones no issue.

[–] [email protected] 16 points 10 months ago

Agreed. For the image that was obviously an emulation of Guts from Berserk, the only reason I called it as AI generated was because his right eye was open. I don't know enough about illustration in general, which led me to guess quite a few incorrect examples there.

I found the photos much easier, and I guessed each of those correctly just by looking for the sort of "melty" quality you get with most AI generated photos, along with intersected line continuity (e.g. an AI probably would have messed up this image where the railing overlaps the plant stems; the continuity made me accept it as real). But with illustrations, I don't know nearly enough about composition and technique, so I can't tell if a particular element is "supposed" to be there or not. I would like to say that the ones I correctly called as AI were still moderately informed because some of them had that sort of AI-generated color balance/bloom to them, but most of them were still just coin toss to me.

[–] [email protected] 22 points 10 months ago (1 children)

Maybe you didn’t recognize the AI images in the wild and assumed they were human made. It’s a survival bias; the bad AI pictures are easy to figure out, but we might be surrounded by them and would not even know.

Same as green screens in movies. It’s so prevalent we don’t see them, but we like to complain a lot about bad green screens. Every time you see a busy street there’s a 90+ % chance it’s a green screen. People just don’t recognize those.

[–] [email protected] 12 points 10 months ago (1 children)

Isn't that called the toupee fallacy?

[–] [email protected] 9 points 10 months ago

It is!!

Someone might assert that, “All toupees look fake. I've never seen a good toupee.” This is an example of neglecting the base rate because if I had seen good toupees, I wouldn't know it.

Thanks, I love learning names for these things when they come up!

[–] [email protected] 21 points 10 months ago (1 children)

If you look at the ratios of each picture, you’ll notice that there are roughly two categories: hard and easy pictures. Based on information like this, OP could fine tune a more comprehensive questionnaire to include some photos that are clearly in between. I think it would be interesting to use this data to figure out what could make a picture easy or hard to identify correctly.

My guess is that a picture is easy if it has fingers or logical structures such as text, railways, buildings etc. while illustrations and drawings could be harder to identify correctly. Also, some natural structures such as coral, leaves and rocks could be difficult to identify correctly. When an AI makes mistakes in those areas, humans won’t notice them very easily.

The number of easy and hard pictures was roughly equal, which brings the mean and median values close to 10/20. If you want to bring that value up or down, just change the number of hard to identify pictures.

[–] [email protected] 4 points 10 months ago (1 children)

The number of easy and hard pictures was roughly equal, which brings the mean and median values close to 10/20.

This is true if "hard" means "it's trying to get you to make the wrong answer" as opposed to "it's so hard to tell, so I'm just going to guess."

[–] [email protected] 2 points 10 months ago (1 children)

That’s a very important distinction. Hard wasn’t the clearest word for that use. I guess I should have called it something else such as deceptive or misleading. The idea is that some pictures got a below 50% ratio, which means that people were really bad at categorizing them correctly.

There were surprisingly few pictures that were close to 50%. Maybe it’s difficult to find pictures that make everyone guess randomly. There are always a few people who know what they’re doing because they generate pictures like this on a weekly basis. The answers will push that ratio higher.

[–] [email protected] 3 points 10 months ago* (last edited 10 months ago)

A great example of the below 50% situation is the picture of the avocado and the tomato. I was confident that that was AI generated because I was pretty sure I'd seen that specific picture used as an example of how good Dall-E 3 was at normal text. However, most people who had used other models were probably used to butchered text and expected that one to be real.

If they did this quiz again with only pictures that were sketches, I bet the standard deviation would be much smaller.

[–] [email protected] 20 points 10 months ago (1 children)

It depends on if these were hand picked as the most convincing. If they were, this can’t be used a representative sample.

[–] [email protected] 11 points 10 months ago (1 children)

But you will always hand pick generated images. It's not like you hit the generate button once and call it a day, you hit it dozens of times tweaking it until you get what you want. This is a perfectly representative sample.

[–] [email protected] 4 points 10 months ago* (last edited 10 months ago)

As a personal example, this is what I generated and after like few hours of tweaking, regenerating and inpainting, this was the final result. And here's another: initial generation, the progress animation, and end result.

Are they perfect, no, but the really obvious bad AI art comes from people who expect it to spit perfect images at you.

[–] [email protected] 14 points 10 months ago

Personally, I’m not surprised. I thought a 3D dancing baby was real.

[–] [email protected] 3 points 10 months ago

From this particular set, at least. Notice how people were better at guessing some particular images.

Stylized and painterly images seem particularly hard to differentiate.