Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A simple non-mathematical resolution is that no information has been gained by selecting the first envelope. Since no information about the contents was available before selecting the first envelope switching envelopes is a neutral act.

While this sounds like just a lazy intuitive explanation, it calls out the key component - a lack of information about the outcome - that can help avoid the paradox in more complex situations.

This is in contrast to the three-door problem where information is gained by opening the first door. https://en.wikipedia.org/wiki/Monty_Hall_problem



This is not a resolution, a resolution is where you point out the faulty step in the reasoning that leads to the paradox.


No, he articulated the problem correctly. He's just wrong that the math isn't with him. Game theory solves imperfect information games using information sets. The math is with him.

In imperfect information games you are in an information set, not a subgame. When the value is 2A and you have to make a keep or switch decision, the value A is still part of how you calculate the solution. When the value A is the game you are in, it doesn't matter, the value 2A is still part of how you calculate the solution. It has to be this way, because you don't have a way of discriminating between which game state you are in. This is one of the notable and counterintuitive (apparently) results of game theory and it really matters, because when you don't notice it you run into a contradiction.

Check out Noam's talk on solving imperfect information games at NIPS 2017 for a more detailed treatment of this subject. I'm going to handwave it to dive deeper into the exact logical contradiction.

So 2A and A both have to exist still for the correct calculation. The subgames are still connected. But notice what happens? What the failed solution does though is it creates two different subgames!

In one they set 2A. In another they have A/2. But its still two different subgames. Now why does this seem appropriate? Because this sort of reasoning is okay in perfect information games! You can reason about different subgames independently of each other.

But we're not in a perfect information game. We're in a null information set. So 2A and A coexist.

Since we can't state that we are in the 2A case or the A case, being in an information set that prevents us from discrimination between these two things, they error when they try to build their reasoning on the supposition they are able to do so. In particular, they create a logical contradiction.

The dependence means you are required to have the subgame from the 2A as the other side of the equation. Required, but they don't do that. Instead they evaluate one subgame as if they are in a perfect information context.

That is the missing equality symbol that people aren't seeing. That dependence. So what is really happening is this: They claim 2A=A; assuming A is not zero, they claim that 2=1. You can see this very clearly when you take the actual correct probabilities for expectation and place them beside the incorrect ones:

Observe.

    They said: 5/4A = (1/2)2A + (1/2)A/2
    Reality  : 3/2A = (1/2)2A + (1/2)A
    Equality : (1/2)2A + (1/2)A/2 = (1/2)2A + (1/2)A
                         (1/2)A/2  = (1/2)A
                              A/2  =  A
                              A    = 2A
                              1    = 2
It's a bit worse than that in the error, because they omit that expectation is a function of both policy and information set. Trying to make the decision based on expected value when expected value is not well defined without a policy choice is another very fundamental error. I mean seriously, ask yourself if you get the same result by slamming your head into a wall as opposed to doing something productive. You don't. Policy choice matters. In this case literally the only policy choice that has 0 EV is the one they claim is the best. Its silly. When you look at the game tree, when you actually evaluate the proper graph, its literally the only bad decision you can make. There are quite literally an infinite number of strategies that give the same EV solution. All except the one they chose.

The reward for the game is only defined for R(KEEP) and it is defined as {0.5: A, 0.5: 2A}. R(SWITCH) ends up getting its valuation according to a bellman equation: R(SWITCH) = R(SWITCH) * P(SWITCH) + R(KEEP) * P(KEEP). This is obviously an infinite sequence if P(SWITCH) = 1. We actually tend to use a discount factor here like 0.9999999999*R(some_choice) because for this infinite sequence it lets us use limits to claim the case is defined as 0.

But this is like, an extension of why this is wrong - because game theory with information sets already solves for probabilities over action space. It already acknowledges it. So when you point to the information set problem? You point to game theory. And you point to the root of why their calculations were wrong: in an imperfect information game, you play both the game you're in and the game you're not in at the same time.


Yeah I was looking for someone saying this to double down. I think the "paradox" comes down to information loss. The tricky bit is this framing: "when you're holding an envelope, the other envelope contains >= $MONEY, so there's no reason not to switch".

But this omits (loses) the information of "both envelopes contain >= $MONEY". That is, as soon as you switch to the other envelope, the situation is still true, the previous envelope contains >= $MONEY.

So if you forget that your current envelope also contains >= $MONEY you're stuck in an infinite loop, which you break out of only because you can only do a single iteration.


The right resolution is: suppose the two amounts are $50 and $100. If the envelope you have contains $50 and you switch, your gain is $50. If it contains $100 and you switch, your gain is -$50. So your expected gain from switching is (0.5)(50) + (0.5)(-50) which is zero.

The way that they fool you is by saying that you either double your money or you halve it, leading you to think that your expected ratio is 2.5/2. But that isn't correct. That ratio doesn't mean anything. That's the wrong step.


Yep! You're in both subgames at once. You're playing both the A state and the 2A state at the same time, because the information set I you are in contains both of them.

This is the key difference between perfect information and imperfect information. In perfect information you can do the assumption of just one subgame and it works because you know what game you are in - its the one you are in. So you are playing over states and actions. In imperfect information you have to assume you are in both at the same time - so you can't assume one subgame, because you don't know which one you are in. You are playing over information sets - every state contained within that information set, not just one state.

I have a feeling that saying it like this, in the general way, is a lot more confusing to people than your phrasing. So thanks for sharing your phrasing.


I think you nailed it. If the question were posed as "Two envelopes with $A and $(A+1)" then there would be no talk of ratios and no "paradox" would ever emerge.


Exactly. The other envelope might be the good one or the bad one. The "paradox" is basically just sophistry.


Oh, I just realized what I missed. You can use the ratio, but if you do you need to take the geometric mean, since it is a multiplying factor. The geometric mean of 2 and 0.5 is sqrt(2 * 0.5) or 1. So the expected ratio is 1.


That’s a different problem though. In the original problem you don’t know the two amounts ahead of time. All you see is $50 in your envelope - you don’t know if the other one is $100 or $25 and you’re back to square one.


But the two choices aren't equally likely, because someone has chosen the amounts in the envelopes through a non-random process. You could make inferences based on what you expect the person to spend on prices, but it's no longer a simple mathematical problem at that point.

Which is why the problem as stated doesn't let you look in the envelope before deciding whether to switch or not. Then it's obvious switching makes no difference.


I think you're confusing yourself. Watch something like Naom Brown's NIPS 2017 lecture on subgame solving in imperfect information games and you'll realize you should be very wary of someone who is trying to play around with solutions to imperfect information games which use state based reasoning - not information set based reasoning.

If you have a tree:

         0.5
         /  \
       A     2A
  
In a perfect information game you can do this thing where your like "assume a" and think ahead and then "assume 2a" and think ahead. But in the the imperfect information game world you don't know whether you have a or 2a. So you have to "assume a 2a" and you have to deal with your solution space using counterfactual reasoning. Picture a chain coming out from the A to the 2A. They are married. They cannot be parted.

It is really counterintuitive, but the way I think about it is "you are playing every game at once" and "you are playing a game over information sets and strategy choices" not "state space and actions" so since you are playing every game at once already when they come in with that other game - well, you were already playing another game. They're overwriting that part of your solutions memory. Which in math terms declares an equality. In that case that equality is 2=1 because they say A = A/2 or that 2A=A depending on which framing you use to build the real game tree.

The actual recurrence relationship is obviously

    R(KEEP) = {0.5: A, 0.5: 2A} = 3/2A. 

    R(SWITCH) = P(KEEP)*R(KEEP) + P(SWITCH)*R(SWITCH)
Solving the game from there is really trivial - you're only allowed one strategy selection per information set and you only one set the null set. You literally get to make only one decision as to a scalar between 0 and 1 to define P(KEEP) and P(SWITCH) which is going to be 1-P(KEEP).

`R(SWITCH)` becomes `R(KEEP)` because of the way the markov chain drains into it. Literally the only thing you can't pick is `P(SWITCH)=1`. Everything else is an equivalent EV solution. P(SWITCH) <= P(KEEP) because there exists one P(SWITCH=1) is undefined or else 0 (in practice we take the limit after a horizon by multiplying by some term like 0.9999999 to avoid the infinite regress).

And since the problem didn't make this clear, your policy __influences your expected value__ and so their reasoning process is just so broken at a fundamental level. They literally declared 2=1 and therefore undefined solution is the best answer despite infinite solutions that aren't undefined. Its... really bad reasoning. Don't listen to the idea that switching is better. It can't be. Look at the actual relationship. Look at the actual graph. They skip some really important steps, like using the same gametree for the entire calculation.


Disregard the above; you basically said the same thing I did, but used a phrasing that made it hard for me to realize you had done so.


The goalposts here are different from a normal "paradox".

The actual solution is explained right there on the page, the expectation for both envelopes (one has x, one has 2x) is x*3/2, so switching changes nothing.

The "solution" demanded is supposed to be pointing out the specific logical error in the erroneous calculation that says switching should win. Citing the correct logic is not considered sufficient.


It seems like this "solution" is saying that the error is in step 1:

> Denote by A the amount in the player's selected envelope.

which is information that we have not actually gained.


Variant:

I tell you "These two envelopes contain money. One of the envelopes contains twice as much money as the other one. Pick one.". You pick one. I tell you the one you picked contains $60. You've now determined the amount in your selected envelope -- should you switch?


Now you know x is either 30 or 60. The expectation is either 45 or 90. Switching has equal likelihood of increasing or decreasing your take. The correct calculation will show the expected value of the other envelope as 60.


Inserting A=60 into the original line of reasoning (just to show that it doesn't immediately resolve the paradox):

1. Denote by A=60 the amount in the player's selected envelope.

2. The probability that A=60 is the smaller amount is 1/2, and that it is the larger amount is also 1/2.

3. The other envelope may contain either 2A=120 or A/2=30.

4. If A=60 is the smaller amount, then the other envelope contains 2A=120.

5. If A=60 is the larger amount, then the other envelope contains A/2=30.

6. Thus the other envelope contains 2A=120 with probability 1/2 and A/2=30 with probability 1/2.

7. So the expected value of the money in the other envelope is: 1/2(120) + 1/2(30) = 75

8. This is greater than A=60 so, on average, the person reasons that they stand to gain by swapping.

9. ...

edit: If anything, to me this obscures things further, because it makes it non-obvious that we are actually talking about ratios (i.e. 2A, 1/2A) rather than absolute numbers, and so the expectation calculated in step 7 is inappropriate - a geometric mean recovers the correct answer.


A can't collapse to sixty. It could still be 30, 60 or 120. The expected value calculation includes the value of A as defined in two different worlds: the world in which A/2 (30) and the world in which 2A (120).

But these two worlds never exist together. They are different realities. We can't determine which of them we are in without more information.

Trying to do so is an error, because we are in multiple subgames simultaneously. In imperfect information games you are already in more than one world. You are in the reality A and counterfactual 2A and you are in the reality 2A and counterfactual A. You can't tell which. You're in both.

In perfect information, you're not in both, so you can do this case based reasoning and not run into as much trouble. But when you're already in both worlds? Well doing the case based reasoning implies that the counterfactual world = the case based world. Its overwriting that part of the equations. Which means you just accidentally declared that either A/2=A or that 2A=A. That 2=1. You do this because you're still in the counterfactual reality, but you're pretending you are not.


In the variant under discussion A is defined as 60: https://news.ycombinator.com/item?id=31574239


You don't know what A is even if he tells you the first envelope is 60. A and 2A exist together counterfactually, if you have A, you have 2A, if you have 2A, you have A. This is a fundamental property of imperfect information games. You don't have State=A, ever, and you are lying to yourself when you pretend that you do. It is /why/ the reasoning error happens. State based reasoning only works in perfect information games because in that framing you don't have counterfactual subgames forcing them to depend on each other - forcing you to play them both simultaneously, because you don't know which subgame you are in.

A and 2A are together, so you can't propose A/2 without violating the A and 2A codependency imposed by not knowing which subgame you are in.

There are multiple A when you get told 60. They are 30, 60, and 120. This leads to three potential solutions to the expected value: 3/2(30), 3/2(60), 3/2(120). You can't tell which of these solutions you are in, because you don't know. Regardless, since the terminating R(KEEP) condition is the only one that is defined P(SWITCH) = 1 still is either undetermined or 0 depending on how you solve the bellman equations. Your policy decides your expected value. So earlier I was a little loose when I claimed the 3/2A relationship.

Honestly, the wikipedia article is really terrible. It demands you stick in its formalism and declares you a no true scotssman if you don't, buts its approach is fundamentally wrong. Throw off its chains and consider the framing where you remove the requirement of thinking about imperfect information. People are bad at it and if this is confusing you then it is /because/ you are struggling with the imperfect information.

The way you convert between the two game types is this: Instead of states -> information sets. So you only have one move in this game. You always have the Null information set. You always have the same situation. So you're only allowed to have one choice and that is it every time forever. So you always have to make the same decision. Secondly instead of actions -> probability vectors over actions. To simplify and make it tractable assume [0.0, 0.1, 0.2, ... 1.0] so the action space is small. You are choosing 'an action' to take and its going to create multiple subgames. Focus on the recurrence relationship between those games. In particular look at the base case: R(KEEP) = {0.5: A, 0.5: 2A}.

Everything heads toward that base case except one thing: P(SWITCH)=1. It is a markov process and R(SWITCH) 'drains' so as to be R(KEEP) because any probability in it inevitably becomes R(KEEP) after enough iterations.

Notice that {0.5: A, 0.5: 2A} is always true! It is true for both you have A and you keep it and it is true for both you have 2A and you keep it, because notice - you never ever have A. You have the null information set. You can't ever see a difference between these two things.

The moment you force in the idea that you can actually define A to be a particular thing, you smash all over that recurrence relationship. You destroy the relationship. You claim there is no relationship between the policy function and the expected value even though /there is/. There is so much wrong with replacing R(SWITCH) with a fake reality where you can you know you're actually in 1/2A when you can't. And that is what the equations are doing. They're saying you can replace the dependence on each other with a subgame that exists in a different reality. Lets say 120 was when you got 60 - there is no 30 in existence, but your equation calls to replace the recurrence relationship with the idea there is one. It doesn't make sense.


You are conflating the quantity A, which is a label for the (unknown in the original formulation) amount in the chosen envelope, with the "smaller amount" - labelled x in the wikipedia article.

A=60 is the amount in our chosen envelope - we are given this information in the variant in this thread. What we still don't know is if that is the larger amount (and thus the other envelope contains 30, and therefore x=30), or the smaller amount (and the other envelope contains 120, and therefore x=60).

The error is (in step 7) to calculate the arithmetic expectation of those absolute values, because they do not exist "together". The correct arithmetic mean can be obtained by considering the different conditions in which those values do exist, as described in [0]. However, the ratios do exist "together" - the other envelope contains either double or half of 60 - so we could instead calculate the geometric mean, of either the ratios or the corresponding absolute values, and obtain the correct result:

    (2 * 0.5) ** 0.5 = 1
    (120 * 30) ** 0.5 = 60
[0] https://en.wikipedia.org/wiki/Two_envelopes_problem#Other_si...

edit: changed “first envelope” to “chosen envelope” for clarity.


On further reflection, you're right, I'm conflating it.

My correction is still valid though. You're not handling step ten properly. You didn't work over the information sets, didn't solve the actual graph that is the game, didn't handle the under-specified policy function.

To try and show you that your solution isn't the actual solution: well, both options have the same EV. So I choose switch every time, because why not. As you are no doubt aware I never get to have EV because I'm constantly swapping. The sixty is a mirage. For my policy choice, the answer was undefined or zero depending on how you write it down. But you told me they had the same EV. So if they did, why did my choice not produce that EV? Ergo, your solution only appears to be giving you the EV.

Think about that for a while and you'll start to realize why I honed in on specifying a recurrence relationship with the terminal keep node and why I'm so eager to escape the trap of their flawed problem model.


Am I really the one who is getting things conflated? Go back to the problem and look at step ten. The problem allows infinite swapping. You are doing an EV calculation, but your analysis is fundamentally flawed. Assign the policy of always switching. Your analysis is claiming that EVs can be calculated, but they can't. The EV is either zero or undefined for switching, because the recurrence relationship is an infinite sequence. Since 60 != 0 and 60 != undefined, but you claim that they are, something is very wrong with your calculations. You're using the wrong formalism. The policy is supposed to be able to vary, but you're treating EV as a concept that isn't dependent on policy.

Lets take a step back and learn the important lesson for more complex situations. Your policy influences your expected value. Not keeping that in mind is going to destroy the ability to correctly calculate expected value. You aren't trying to search for the best thing to do on the basis of expected value. You are searching for the right policy to provoke a high expected value. The difference is subtle, but essential.

How do we correct it? Well, the right formalisms that allow you to search for the correct policy comes from several fields, but one of them is game theory. In game theory when dealing with imperfect information, it is considered incorrect to do state-based reasoning under imperfect information. This is because you aren't in a state - you are in an information set. When you are playing the game you have to consider every game you could be in, because you don't know which you are in.

This is a second problem with the analysis, but I think you corrected this one.

They ask to be able to translate this into more complex situation. So the general lesson here is about considering counterfactuals in your analysis. An example of this in practice is Jeff Bezo's talking about his decision to found Amazon on the basis of regret minimization on account of his theory about how he would feel in various counterfactual futures. He didn't consider one EV, the founding of Amazon, but also other EVs like founding Amazon and failing and also not founding Amazon and doing other things.

I think I get why you think I'm conflating A, but I'm actually trying to point out that the wikipedia article is conflating A and so its hard to have a productive discussion due to our inheritance of their misuse of terms. I don't want to conflate A, but the Wikipedia article defined A in their expected value calculation and in that equation it ends up taking on a different meaning to what it means when it is defined to be 60. And their meaning ends up claiming things like 1=2 in practice, because of the properties of the hidden counterfactual part of their equations - just because they neglect to show them, doesn't mean they don't exist in the correct mathematics. So the logical contradiction is there - which is exactly the thing the problem asks us to identify.


I find your exposition difficult to follow, perhaps you could write out your proposed solution without the accompanying explanatory text, so that we can clearly see how it resolves the paradox.


Okay. lets start with the easy analysis; the version of the game that terminate after only one switch.

I[null] = {0.5: A, 0.5: 2A}

We get the expected change in EV for switching like this:

    (
        # The expected gain of switching if we are in subgame A
        (2A-A) * 0.5
    
        +

        # The expected loss of switching if we are in subgame 2A
        (A-2A) * 0.5
    )
See how we had to consider two different subgames? We didn't know whether we had A or 2A. We only knew we had I[null].

You had to consider the benefit of switching from A to 2A and the cost of switching from 2A to A. You had to consider the factual reality you were in, but also the counterfactual reality you were not in.

Here is the reality of the first part of the game tree:

   ChanceNode(0.5)
  /           \ 
 A            2A
In a perfect information game, A and 2A are disconnected because they are in two different subtrees, but what watch what happens when we convert from the perfect information view to the imperfect information view of that world that we are actually dealing with:

        0.5
      /    \
    I[nil] I[nil]

We have two information sets now as the branches. One is actually A, but when we do reasoning about it we need to counterfactually consider it the other parts of the information set.

                      0.5
      /                                    \
    {Factually A, Counterfactually 2A}     {Factually 2A, Counterfactually A}
So I hope you're starting to realize that A and 2A are fundamentally connected. You can't reason about A without including 2A. You can't reason about 2A without including A. This is really really important to one of several reason that their analysis is wrong. You don't know which branch you are in. So you can't condition on being in A, like they do in the wikipedia article.

Look at the trouble they run into when they /do/ condition on A. I just showed you these counterfactuals exist, but when they condition they still exist. It treats the situation as if you can just deal with A and just deal with 2A. So they get a 2A case and a A/2 case both of which have their own counterfactuals. Notice what just happened when they did that.

In the A/2 case since we are in imperfect information there is a counterfactual associated with it. What is that counterfactual? It is A! So they don't just have A/2 they also have counterfactual A.

And now look at the other case. 2A has a counterfactual associated with it too. What is it? A.

So they have this:

[F: A, CF: 2A], [F: A/2 CF: A]

And what I'm trying to point out to you is that they just declared A=A/2 and 2A=A, because they neglected the counterfactual relationships.

You can't condition on A in the subgame; you don't have perfect information - you don't have A. You have I[null]. Even when you get told 60 you still have I[null].

/A/ isn't 60 and it can't be because you don't know which subgame you are in. A isn't given. If you knew A, if it was possible to know A, you wouldn't be in an imperfect information game. A is defined in point one yes - but it is also used in point seven and when it is used there we get the logical contradiction.


The paradox is supposed to be in the contradiction between this (incorrect) reasoning and the correct reasoning, not in the counterintuitive conclusion of the incorrect reasoning.


There actually is a paradox:

Select a power of 2 according to the probability

P(2^k) = 0.2 * 0.8^k

i.e. p(1) = 20%, p(2) = 16%, p(4) = 12.8%, etc.

Then place this amount in one envelope and twice that I'm the other envelope.

Then, randomly shuffle the envelopes and select one and open it. Say you see the number 16. It could have landed there in two ways: Either 16 was generated and the other envelope has 32. This happens with probability 0.2*0.8^4 ~ 8.192%. Alternatively 8 was generated and the envelope you have is already the largest one. This happens with probability 0.2*0.8^3 ~ 10.24%. Given that you see 16 already, the relative probability becomes 4/9 vs 5/9. In other words the chance that you have the largest envelope already is 5/9. The expected value of the other envelope is then 5/9*8 + 4/9*32 = 18.67 > 16 so you should switch.

In fact, try this for different initial envelopes and you'll see that you should always switch.

But, if you should always switch, why even look inside the envelope?

But, then you're back at the symmetry / no information argument.


This is similar to the St. Petersburg problem. The way I satisfy myself with that answer is to note that it's kind of a spherical cow problem. Nobody in the real world can afford to run this game, since it requires an infinite amount of money -- even a currency issuer would expect to just make their currency worthless by running it, and an offering of physical resources instead of currency would violate thermodynamics in all sorts of ways.

Spherical cow problems often end in counterintuitive results.


Note that the problem requires allowing arbitrarily small amounts of money in the envelopes, otherwise you would know at some point that you have the smallest amount of money possible. So in your formulation, you have to let k be negative as well.


No, k >= 0 in my example. In one case you know for sure that you have the smallest envelope since you see that it has $1 in it. I'm this case you should absolutely switch. In all other cases, you should still switch although you can't be completely sure.

That being said, even if you let k go negative I think it's still a paradox.

If you're curious the resolution to this is that before you look in the envelope the expected value of both envelopes is infinite. Only after you look inside one of them does that collapse to two finite values that you can subtract.


If you disallow negative values of k then you break the symmetry argument, and you do indeed get quite a bit of information by opening an envelope.

I assume you meant to use a exponential base <0.5 so that the mean is finite


The point I'm making is that if you open the envelope you will always conclude that it's advantageous to switch regardless of what you see in that envelope. In the case where you see $1, you're guaranteed that switching is a good idea. If you see anything other than $1 you come to the same conclusion but you have to rely on a probabilistic argument using expected values. Now, if you will decide to switch regardless of what you see in the envelope, then why even bother opening it? But, if you don't open the envelope, then you're back to the original problem where the symmetry argument is valid.


Every envelope you open will have a higher expected value than the previous one. This is not paradoxical, it is a standard property of probability distributions with infinite means.

I believe the problem you are trying to elicit is that you can almost show that either envelope is a priori better than the other. The problem with this is that the means are infinite so the math isn't sound. So there's not an actual contradiction; the strategy is to open either envelope and then switch to the other one. It's unintuitive but only because we're not used to reasoning about distributions with infinite means.


Right, I mentioned above that the resolution to this is that the expected value of both envelopes is infinite before you open anything and thus the difference is undefined. Once you open one, the difference becomes well defined.

I still think it's quite unintuitive that one needs to open an envelope in order to regularize things this way. The fact that you will always decide to switch, regardless of what you see in the first envelope, yet you still have to open it is what I find paradoxical.


So the real predictor is the probability.

Once the probability gets around 7/9*.5X + 2/9*4X, then switching is no longer attractive.


There's an amusing amount of probability puzzles with seemingly paradoxical solutions out there, to the point that it's almost more surprising to see one where the intuitive answer is correct:

- https://en.wikipedia.org/wiki/Boy_or_Girl_paradox

- https://en.wikipedia.org/wiki/Penney%27s_game

- Waiting time paradox

- Honorary mention: https://en.wikipedia.org/wiki/St._Petersburg_paradox

(feel free to add more)



> This is in contrast to the three-door problem where information is gained by opening the first door.

This is in contrast to the earlier problem statement of Gardiner's wealthy men playing against each other as per the article, actually.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: