Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I similarly have a small, simple spatial reasoning problem that only reasoning models get right, and not all of them, and which Qwen3 on max reasoning still gets wrong.

> I put a coin in a cup and slam it upside-down on a glass table. I can't see the coin because the cup is over it. I slide a mirror under the table and see heads. What will I see if I take the cup (and the mirror) away?



Sonnet 3.7 non-reasoning got it right.

I'll think this through step by step. When you place a coin in a cup and slam it upside-down on a glass table, the coin will be between the table surface and the cup. When you look at the reflection in the mirror beneath the table, you're seeing the bottom side of the coin through the glass. Since the mirror shows heads, you're seeing the heads side of the coin reflected in the mirror. In a mirror, the image is reversed left-to-right, but the "heads" and "tails" sides of a coin would not be flipped in this reflection - you'd still recognize heads as heads. If you take away both the cup and the mirror, you'll see the opposite side of the coin compared to what you saw in the mirror. Since the mirror showed heads, you'll see tails when looking directly at the coin from above the table.


Not reasoning mode, but I struggle to call that “non-reasoning”.


one-shot mode?


> In a mirror, the image is reversed left-to-right

lol


My first try (omitting chain of thought for brevity):

    When you remove the cup and the mirror, you will see tails. Here's the breakdown:

    Setup: The coin is inside an upside-down cup on a glass table. The cup blocks direct view of the coin from above and below (assuming the cup's base is opaque).
    Mirror Observation: A mirror is slid under the glass table, reflecting the underside of the coin (the side touching the table). If the mirror shows heads, this means the underside of the coin is heads.
    Coin Orientation: A coin has two sides: heads and tails. If the underside (facing the table) is heads, the top side (facing away from the table) must be tails.
    Result: Removing the cup reveals the coin on the table. Since the top side was tails all along, you will see tails.

    Key Insight: The mirror reveals the hidden underside of the coin, but the top side (visible after removing the cup) is its opposite. The mirror’s reflection does not alter the physical orientation of the coin’s sides—it simply reflects the existing state.

    Answer: Tails.


Huh, for me it said:

Answer: You will see the same side of the coin that you saw in the mirror — heads .

Why?

The glass table is transparent , so when you look at the coin from below (using a mirror), you're seeing the top side of the coin (the side currently facing up). Mirrors reverse front-to-back , not left-to-right. So the image is flipped in depth, but the orientation of the coin (heads or tails) remains clear. Since the coin hasn't moved during this process, removing the cup and mirror will reveal the exact same face of the coin that was visible via the mirror — which was heads.

Final Answer: You will see heads.


The question doesn't define which side you're going to look from at the end, so either looking down or up is valid.


While technically true, one could argue that because they described sliding a mirror underneath the table, that they did this in order to be able to see from that particular angle, so the implication for any reasonable answer would be that are looking down at the table.


The way the question is phrased, I'm definitely in the "ok, what's the trick, what do you want to hear" mode. I'm not surprised LLMs answer in a weird way with extra details like "this is how mirrors work, is this what you're confused about?"


Manus just answered this correctly for me in one crack:

Me: I put a coin in a cup and slam it upside-down on a glass table. I can't see the coin because the cup is over it. I slide a mirror under the table and see heads. What will I see if I take the cup (and the mirror) away?

Manus: This is a classic riddle! Let's think it through. You placed a coin in a cup, turned it upside down on a glass table, and then slid a mirror underneath the table. Looking into the mirror from below, you saw 'heads'. Since the mirror reflects the side of the coin facing downwards through the glass table, the 'heads' side must be facing down. Therefore, the side facing up, hidden inside the cup, must be 'tails'. When you remove the cup, you will see the side facing up, which is tails.


Except it's wrong about this being a classic riddle.


Yup, it flunked that one.

I also have a question that LLMs always got wrong until ChatGPT o3, and even then it has a hard time (I just tried it again and it needed to run code to work it out). Qwen3 failed, and every time I asked it to look again at its solution it would notice the error and try to solve it again, failing again:

> A man wants to cross a river, and he has a cabbage, a goat, a wolf and a lion. If he leaves the goat alone with the cabbage, the goat will eat it. If he leaves the wolf with the goat, the wolf will eat it. And if he leaves the lion with either the wolf or the goat, the lion will eat them. How can he cross the river?

I gave it a ton of opportunities to notice that the puzzle is unsolvable (with the assumption, which it makes, that this is a standard one-passenger puzzle, but if it had pointed out that I didn't say that I would also have been happy). I kept trying to get it to notice that it failed again and again in the same way and asking it to step back and think about the big picture, and each time it would confidently start again trying to solve it. Eventually I ran out of free messages.


4o with thinking:

By systematic (BFS) search of the entire 32-state space under these rules, one finds no path from to that stays always safe. Thus the puzzle has no solution—there is no way for the man to ferry all four items across without at least one of them being eaten.


You go with the cabbage, goat, wolf and lion all together!


O3 gave me basically that solution. "Below is the shortest safe schedule that really works ‒ but it assumes the boat can hold the man plus two passengers (three beings total). If your version of the puzzle only lets him move one passenger at a time, the puzzle has no solution: at the very first trip he would always leave at least one forbidden pair alone."


i tried grok 3 with Think and it was right also with pretty good thinking


I don't have access to Think, but I tried Grok 3 regular, and it was hilarious, one of the longest answers I've ever seen.

Just giving the headings, without any of the long text between each one where it realizes it doesn't work, I get:

    Solution
        [... paragraphs of text ommitted each time]
    Issue and Revision
    Revised Solution
    Final Solution
    Correct Sequence
    Final Working Solution
    Corrected Final Solution
    Final Correct Solution
    Successful Solution
    Final answer
    Correct Final Sequence
    Final Correct Solution
    Correct Solution
    Final Working Solution
    Correct Solution
    Final Answer
    Final Answer
Each time it's so confident that it's worked out the issue, and now, finally, it has the correct, final, working solution. Then it blows it again.

I'm surprised I didn't start seeing heading titles such as "Working solution-FINAL (3) revised updated ACTUAL-FINAL (2)"


I tried with the thinking option on and it gets into some networking errors, if you don’t turn on the thinking it guesses the answer correctly.

> Summary:

- Mirror shows: *Heads* → That's the *bottom face* of the coin. - So actual top face (visible when cup is removed): *Tails*

Final answer: *You will see tails.*


The only thing I don't like about this test is that I prefer test questions that don't have binary responses (e.g. heads or tails) - you can see from the responses that you got from the thread that the LLMs success rates are all over the map.


Yeah, same.

I had a more complicated prompt that failed much more reliably - instead of a mirror I had another person looking from below. But it had some issues where Claude would often want to refuse on ethical grounds, like I'm working out how to scam people or something, and many reasoning models would yammer on about whether or not the other person was lying to me. So I simplified to this.

I'd love another simple spatial reasoning problem that's very easy for humans but LLMs struggle with, which does NOT have a binary output.


I always feel that if you share a problem here where LLMs fail, it will end up in their training set and it wont fail to that problem anymore, which means the future models will have the same errors but you have lost your ability to detect them.


My favorite part of the genre of “questions an LLM still can’t answer because they’re useless!” is all the people sharing results from different LLMs where they clearly answer the question correctly.


I use LLMs extensively and probably should not be bundled into that genre as I've never called LLMs useless.


I think it's pretty random. qwen3:4b got it correct once, on re-run it told me the coin is actually behind the mirror, and then did this brilliant maneuver:

  - The question is **not** asking for the location of the coin, but its **identity**.
  - The coin is simply a **coin**, and the trick is in the riddle's wording.

  ---

  ### Final Answer:

  $$
  \boxed{coin}
  $$


ChatGPT free gets it right without reasoning mode (still explained some steps) https://chatgpt.com/share/6810bc66-5e78-8001-b984-e4f71ee423...


Tried it with o1-pro:

> You’ll find that the actual face of the coin under the cup is tails. Seeing “heads” in the mirror from underneath indicates that, on top, the coin is really tails‑up.


Simple Claude 3.5 with no reasoning gets it right.


Neat, because 3.7 doesn't.


3.7 works for me too. Of course it's going to depend on temperature a bit.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: