this post was submitted on 23 Jun 2024
33 points (97.1% liked)

TechTakes

1435 readers
118 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago
MODERATORS
 

Need to make a primal scream without gathering footnotes first? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid: Welcome to the Stubsack, your first port of call for learning fresh facts of Awful you'll near-instantly regret.

Any awful.systems sub may be subsneered in this subthread, techtakes or no.

If your sneer seems higher quality than you thought, feel free to cut’n’paste it into its own post — there’s no quota for posting and the bar really isn’t that high.

The post Xitter web has spawned soo many “esoteric” right wing freaks, but there’s no appropriate sneer-space for them. I’m talking redscare-ish, reality challenged “culture critics” who write about everything but understand nothing. I’m talking about reply-guys who make the same 6 tweets about the same 3 subjects. They’re inescapable at this point, yet I don’t see them mocked (as much as they should be)
Like, there was one dude a while back who insisted that women couldn’t be surgeons because they didn’t believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I can’t escape them, I would love to sneer at them.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 4 points 5 months ago (14 children)

I tried using Claude 3.5 sonnet and .... it's actually not bad. Can someone please come up with a simple logic puzzle that it abysmally fails on so I can feel better? It passed the "nonsense river challenge" and the "how many sisters does the brother have" tests, both of which fooled gpt4.

[–] [email protected] 14 points 5 months ago

I tried using Claude 3.5 sonnet and … it’s actually not bad. Can someone please come up with a simple logic puzzle that it abysmally fails on so I can feel better?

It's still very good at autocompleting an answer to look like its training data (because that's what it's doing) and very bad at logic (which it is not doing.)

"I have two chickens and a sack of corn I need to get across a river, but the boat can only carry me and one item. How do I do it?"

This is a classic river-crossing puzzle. Let's think through it step-by-step:

  • First, note that if left alone, a chicken will eat the corn, and one chicken might eat the other.
  • Start by taking a chicken across the river.

...and we'll stop there, since it came up with the wrong constraints, failed to realize that these constraints are impossible to fulfill, and also violated the actual constraints in its first step.

Will more detail help? No. "I have two vegetarian chickens and a sack of corn I need to get across a river, but the boat can only carry me and one item. How do I do it?"

First, we need to consider the constraints:

  • You can only take one item (chicken or corn) across at a time.
  • You can't leave a chicken alone with the corn, as the chicken would eat it.
  • The chickens are vegetarian, so they won't eat each other.

Here's a solution:

  • Step 1: Take one chicken across the river.
  • Step 2: Return alone.
  • Step 3: Take the second chicken across.
  • Step 4: Bring the first chicken back.
  • Step 5: Take the corn across.
  • Step 6: Return alone.
  • Step 7: Take the remaining chicken across.

Explanation:

  • By never leaving a chicken alone with the corn, we prevent the corn from being eaten.
  • We use one chicken as a "guard" for the corn on the original shore while transporting the other items.

Because it's not actually doing logic it got the first step wrong, violated the constraints multiple times, and also included an irrelevant idea (the guard) from a variant that it was trained on. I only got a correct, efficient answer one time out of ten trials of trivial variations on this classic puzzle.

load more comments (13 replies)