this post was submitted on 03 Feb 2025
888 points (98.5% liked)
Technology
61394 readers
4211 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
how tf did it take 6 years to analyze 8000 posts
I pretty sure they selected posts from a 6 year period, not that they spent six years on the analysis.
I can’t even fathom how they would go about testing if it’s an AI or not. I can’t imagine that’s an exact science either.
In that case, how/why did they only choose 8000 posts over 6 years? Facebook probably gets more than 8000 new posts per second.
I was wondering how far I'd have to scroll before getting to someone who doesn't understand statistics complaining about the sample size...
Every study uses sampling. They don't have the resources to check everything. I have to imagine it took a lot of work to verify conclusively whether something was or was not generated. It's a much larger sample size than a lot of studies.
The study is by a company that creates software to detect AI content, so it's literally their whole job
(it also means there's a conflict of interest, since they want to show how much content their detector can detect)
It's an extremely small proportion of the total number of Facebook posts though. Nowhere near enough for statistical significance.
The proportion of the total population size is almost irrelevant when you use random sampling. It doesn't rely on examining a large portion of the population, but rather that it becomes increasingly unlikely for the sample set to deviate dramatically from the population size as the number of samples rises. This is a function of the number of samples you take, decoupled from the population size.
https://en.wikipedia.org/wiki/Sampling_(statistics)
Usually if you see a major poll in a population, it'll be something like 1k to 2k people who get polled, regardless of the population size.