this post was submitted on 05 Feb 2025
306 points (82.0% liked)

Technology

61774 readers
4400 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 24 points 1 day ago (2 children)

"My hammer is not well suited to cut vegetables" ๐Ÿคท

There is so much to say about AI, can we move on from "it can't count letters and do math" ?

[โ€“] [email protected] 6 points 21 hours ago (2 children)

But the problem is more "my do it all tool randomly fails at arbitrary tasks in an unpredictable fashion" making it hard to trust as a tool in any circumstances.

[โ€“] [email protected] 1 points 17 hours ago

Answer, you're using it wrong /stevejobs

[โ€“] [email protected] 2 points 21 hours ago (1 children)

it would be like complaining that a water balloon isn't useful because it isn't accurate. LLMs are good at approximating language, numbers are too specific and have more objective answers.

[โ€“] [email protected] 8 points 1 day ago (1 children)

I get that it's usually just a dunk on AI, but it is also still a valid demonstration that AI has pretty severe and unpredictable gaps in functionality, in addition to failing to properly indicate confidence (or lack thereof).

People who understand that it's a glorified autocomplete will know how to disregard or prompt around some of these gaps, but this remains a litmus test because it succinctly shows you cannot trust an LLM response even in many "easy" cases.