Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Two envelopes problem (wikipedia.org)
249 points by niklasbuschmann on May 31, 2022 | hide | past | favorite | 300 comments


A simple non-mathematical resolution is that no information has been gained by selecting the first envelope. Since no information about the contents was available before selecting the first envelope switching envelopes is a neutral act.

While this sounds like just a lazy intuitive explanation, it calls out the key component - a lack of information about the outcome - that can help avoid the paradox in more complex situations.

This is in contrast to the three-door problem where information is gained by opening the first door. https://en.wikipedia.org/wiki/Monty_Hall_problem


This is not a resolution, a resolution is where you point out the faulty step in the reasoning that leads to the paradox.


No, he articulated the problem correctly. He's just wrong that the math isn't with him. Game theory solves imperfect information games using information sets. The math is with him.

In imperfect information games you are in an information set, not a subgame. When the value is 2A and you have to make a keep or switch decision, the value A is still part of how you calculate the solution. When the value A is the game you are in, it doesn't matter, the value 2A is still part of how you calculate the solution. It has to be this way, because you don't have a way of discriminating between which game state you are in. This is one of the notable and counterintuitive (apparently) results of game theory and it really matters, because when you don't notice it you run into a contradiction.

Check out Noam's talk on solving imperfect information games at NIPS 2017 for a more detailed treatment of this subject. I'm going to handwave it to dive deeper into the exact logical contradiction.

So 2A and A both have to exist still for the correct calculation. The subgames are still connected. But notice what happens? What the failed solution does though is it creates two different subgames!

In one they set 2A. In another they have A/2. But its still two different subgames. Now why does this seem appropriate? Because this sort of reasoning is okay in perfect information games! You can reason about different subgames independently of each other.

But we're not in a perfect information game. We're in a null information set. So 2A and A coexist.

Since we can't state that we are in the 2A case or the A case, being in an information set that prevents us from discrimination between these two things, they error when they try to build their reasoning on the supposition they are able to do so. In particular, they create a logical contradiction.

The dependence means you are required to have the subgame from the 2A as the other side of the equation. Required, but they don't do that. Instead they evaluate one subgame as if they are in a perfect information context.

That is the missing equality symbol that people aren't seeing. That dependence. So what is really happening is this: They claim 2A=A; assuming A is not zero, they claim that 2=1. You can see this very clearly when you take the actual correct probabilities for expectation and place them beside the incorrect ones:

Observe.

    They said: 5/4A = (1/2)2A + (1/2)A/2
    Reality  : 3/2A = (1/2)2A + (1/2)A
    Equality : (1/2)2A + (1/2)A/2 = (1/2)2A + (1/2)A
                         (1/2)A/2  = (1/2)A
                              A/2  =  A
                              A    = 2A
                              1    = 2
It's a bit worse than that in the error, because they omit that expectation is a function of both policy and information set. Trying to make the decision based on expected value when expected value is not well defined without a policy choice is another very fundamental error. I mean seriously, ask yourself if you get the same result by slamming your head into a wall as opposed to doing something productive. You don't. Policy choice matters. In this case literally the only policy choice that has 0 EV is the one they claim is the best. Its silly. When you look at the game tree, when you actually evaluate the proper graph, its literally the only bad decision you can make. There are quite literally an infinite number of strategies that give the same EV solution. All except the one they chose.

The reward for the game is only defined for R(KEEP) and it is defined as {0.5: A, 0.5: 2A}. R(SWITCH) ends up getting its valuation according to a bellman equation: R(SWITCH) = R(SWITCH) * P(SWITCH) + R(KEEP) * P(KEEP). This is obviously an infinite sequence if P(SWITCH) = 1. We actually tend to use a discount factor here like 0.9999999999*R(some_choice) because for this infinite sequence it lets us use limits to claim the case is defined as 0.

But this is like, an extension of why this is wrong - because game theory with information sets already solves for probabilities over action space. It already acknowledges it. So when you point to the information set problem? You point to game theory. And you point to the root of why their calculations were wrong: in an imperfect information game, you play both the game you're in and the game you're not in at the same time.


Yeah I was looking for someone saying this to double down. I think the "paradox" comes down to information loss. The tricky bit is this framing: "when you're holding an envelope, the other envelope contains >= $MONEY, so there's no reason not to switch".

But this omits (loses) the information of "both envelopes contain >= $MONEY". That is, as soon as you switch to the other envelope, the situation is still true, the previous envelope contains >= $MONEY.

So if you forget that your current envelope also contains >= $MONEY you're stuck in an infinite loop, which you break out of only because you can only do a single iteration.


The right resolution is: suppose the two amounts are $50 and $100. If the envelope you have contains $50 and you switch, your gain is $50. If it contains $100 and you switch, your gain is -$50. So your expected gain from switching is (0.5)(50) + (0.5)(-50) which is zero.

The way that they fool you is by saying that you either double your money or you halve it, leading you to think that your expected ratio is 2.5/2. But that isn't correct. That ratio doesn't mean anything. That's the wrong step.


Yep! You're in both subgames at once. You're playing both the A state and the 2A state at the same time, because the information set I you are in contains both of them.

This is the key difference between perfect information and imperfect information. In perfect information you can do the assumption of just one subgame and it works because you know what game you are in - its the one you are in. So you are playing over states and actions. In imperfect information you have to assume you are in both at the same time - so you can't assume one subgame, because you don't know which one you are in. You are playing over information sets - every state contained within that information set, not just one state.

I have a feeling that saying it like this, in the general way, is a lot more confusing to people than your phrasing. So thanks for sharing your phrasing.


I think you nailed it. If the question were posed as "Two envelopes with $A and $(A+1)" then there would be no talk of ratios and no "paradox" would ever emerge.


Exactly. The other envelope might be the good one or the bad one. The "paradox" is basically just sophistry.


Oh, I just realized what I missed. You can use the ratio, but if you do you need to take the geometric mean, since it is a multiplying factor. The geometric mean of 2 and 0.5 is sqrt(2 * 0.5) or 1. So the expected ratio is 1.


That’s a different problem though. In the original problem you don’t know the two amounts ahead of time. All you see is $50 in your envelope - you don’t know if the other one is $100 or $25 and you’re back to square one.


But the two choices aren't equally likely, because someone has chosen the amounts in the envelopes through a non-random process. You could make inferences based on what you expect the person to spend on prices, but it's no longer a simple mathematical problem at that point.

Which is why the problem as stated doesn't let you look in the envelope before deciding whether to switch or not. Then it's obvious switching makes no difference.


I think you're confusing yourself. Watch something like Naom Brown's NIPS 2017 lecture on subgame solving in imperfect information games and you'll realize you should be very wary of someone who is trying to play around with solutions to imperfect information games which use state based reasoning - not information set based reasoning.

If you have a tree:

         0.5
         /  \
       A     2A
  
In a perfect information game you can do this thing where your like "assume a" and think ahead and then "assume 2a" and think ahead. But in the the imperfect information game world you don't know whether you have a or 2a. So you have to "assume a 2a" and you have to deal with your solution space using counterfactual reasoning. Picture a chain coming out from the A to the 2A. They are married. They cannot be parted.

It is really counterintuitive, but the way I think about it is "you are playing every game at once" and "you are playing a game over information sets and strategy choices" not "state space and actions" so since you are playing every game at once already when they come in with that other game - well, you were already playing another game. They're overwriting that part of your solutions memory. Which in math terms declares an equality. In that case that equality is 2=1 because they say A = A/2 or that 2A=A depending on which framing you use to build the real game tree.

The actual recurrence relationship is obviously

    R(KEEP) = {0.5: A, 0.5: 2A} = 3/2A. 

    R(SWITCH) = P(KEEP)*R(KEEP) + P(SWITCH)*R(SWITCH)
Solving the game from there is really trivial - you're only allowed one strategy selection per information set and you only one set the null set. You literally get to make only one decision as to a scalar between 0 and 1 to define P(KEEP) and P(SWITCH) which is going to be 1-P(KEEP).

`R(SWITCH)` becomes `R(KEEP)` because of the way the markov chain drains into it. Literally the only thing you can't pick is `P(SWITCH)=1`. Everything else is an equivalent EV solution. P(SWITCH) <= P(KEEP) because there exists one P(SWITCH=1) is undefined or else 0 (in practice we take the limit after a horizon by multiplying by some term like 0.9999999 to avoid the infinite regress).

And since the problem didn't make this clear, your policy __influences your expected value__ and so their reasoning process is just so broken at a fundamental level. They literally declared 2=1 and therefore undefined solution is the best answer despite infinite solutions that aren't undefined. Its... really bad reasoning. Don't listen to the idea that switching is better. It can't be. Look at the actual relationship. Look at the actual graph. They skip some really important steps, like using the same gametree for the entire calculation.


Disregard the above; you basically said the same thing I did, but used a phrasing that made it hard for me to realize you had done so.


The goalposts here are different from a normal "paradox".

The actual solution is explained right there on the page, the expectation for both envelopes (one has x, one has 2x) is x*3/2, so switching changes nothing.

The "solution" demanded is supposed to be pointing out the specific logical error in the erroneous calculation that says switching should win. Citing the correct logic is not considered sufficient.


It seems like this "solution" is saying that the error is in step 1:

> Denote by A the amount in the player's selected envelope.

which is information that we have not actually gained.


Variant:

I tell you "These two envelopes contain money. One of the envelopes contains twice as much money as the other one. Pick one.". You pick one. I tell you the one you picked contains $60. You've now determined the amount in your selected envelope -- should you switch?


Now you know x is either 30 or 60. The expectation is either 45 or 90. Switching has equal likelihood of increasing or decreasing your take. The correct calculation will show the expected value of the other envelope as 60.


Inserting A=60 into the original line of reasoning (just to show that it doesn't immediately resolve the paradox):

1. Denote by A=60 the amount in the player's selected envelope.

2. The probability that A=60 is the smaller amount is 1/2, and that it is the larger amount is also 1/2.

3. The other envelope may contain either 2A=120 or A/2=30.

4. If A=60 is the smaller amount, then the other envelope contains 2A=120.

5. If A=60 is the larger amount, then the other envelope contains A/2=30.

6. Thus the other envelope contains 2A=120 with probability 1/2 and A/2=30 with probability 1/2.

7. So the expected value of the money in the other envelope is: 1/2(120) + 1/2(30) = 75

8. This is greater than A=60 so, on average, the person reasons that they stand to gain by swapping.

9. ...

edit: If anything, to me this obscures things further, because it makes it non-obvious that we are actually talking about ratios (i.e. 2A, 1/2A) rather than absolute numbers, and so the expectation calculated in step 7 is inappropriate - a geometric mean recovers the correct answer.


A can't collapse to sixty. It could still be 30, 60 or 120. The expected value calculation includes the value of A as defined in two different worlds: the world in which A/2 (30) and the world in which 2A (120).

But these two worlds never exist together. They are different realities. We can't determine which of them we are in without more information.

Trying to do so is an error, because we are in multiple subgames simultaneously. In imperfect information games you are already in more than one world. You are in the reality A and counterfactual 2A and you are in the reality 2A and counterfactual A. You can't tell which. You're in both.

In perfect information, you're not in both, so you can do this case based reasoning and not run into as much trouble. But when you're already in both worlds? Well doing the case based reasoning implies that the counterfactual world = the case based world. Its overwriting that part of the equations. Which means you just accidentally declared that either A/2=A or that 2A=A. That 2=1. You do this because you're still in the counterfactual reality, but you're pretending you are not.


In the variant under discussion A is defined as 60: https://news.ycombinator.com/item?id=31574239


You don't know what A is even if he tells you the first envelope is 60. A and 2A exist together counterfactually, if you have A, you have 2A, if you have 2A, you have A. This is a fundamental property of imperfect information games. You don't have State=A, ever, and you are lying to yourself when you pretend that you do. It is /why/ the reasoning error happens. State based reasoning only works in perfect information games because in that framing you don't have counterfactual subgames forcing them to depend on each other - forcing you to play them both simultaneously, because you don't know which subgame you are in.

A and 2A are together, so you can't propose A/2 without violating the A and 2A codependency imposed by not knowing which subgame you are in.

There are multiple A when you get told 60. They are 30, 60, and 120. This leads to three potential solutions to the expected value: 3/2(30), 3/2(60), 3/2(120). You can't tell which of these solutions you are in, because you don't know. Regardless, since the terminating R(KEEP) condition is the only one that is defined P(SWITCH) = 1 still is either undetermined or 0 depending on how you solve the bellman equations. Your policy decides your expected value. So earlier I was a little loose when I claimed the 3/2A relationship.

Honestly, the wikipedia article is really terrible. It demands you stick in its formalism and declares you a no true scotssman if you don't, buts its approach is fundamentally wrong. Throw off its chains and consider the framing where you remove the requirement of thinking about imperfect information. People are bad at it and if this is confusing you then it is /because/ you are struggling with the imperfect information.

The way you convert between the two game types is this: Instead of states -> information sets. So you only have one move in this game. You always have the Null information set. You always have the same situation. So you're only allowed to have one choice and that is it every time forever. So you always have to make the same decision. Secondly instead of actions -> probability vectors over actions. To simplify and make it tractable assume [0.0, 0.1, 0.2, ... 1.0] so the action space is small. You are choosing 'an action' to take and its going to create multiple subgames. Focus on the recurrence relationship between those games. In particular look at the base case: R(KEEP) = {0.5: A, 0.5: 2A}.

Everything heads toward that base case except one thing: P(SWITCH)=1. It is a markov process and R(SWITCH) 'drains' so as to be R(KEEP) because any probability in it inevitably becomes R(KEEP) after enough iterations.

Notice that {0.5: A, 0.5: 2A} is always true! It is true for both you have A and you keep it and it is true for both you have 2A and you keep it, because notice - you never ever have A. You have the null information set. You can't ever see a difference between these two things.

The moment you force in the idea that you can actually define A to be a particular thing, you smash all over that recurrence relationship. You destroy the relationship. You claim there is no relationship between the policy function and the expected value even though /there is/. There is so much wrong with replacing R(SWITCH) with a fake reality where you can you know you're actually in 1/2A when you can't. And that is what the equations are doing. They're saying you can replace the dependence on each other with a subgame that exists in a different reality. Lets say 120 was when you got 60 - there is no 30 in existence, but your equation calls to replace the recurrence relationship with the idea there is one. It doesn't make sense.


You are conflating the quantity A, which is a label for the (unknown in the original formulation) amount in the chosen envelope, with the "smaller amount" - labelled x in the wikipedia article.

A=60 is the amount in our chosen envelope - we are given this information in the variant in this thread. What we still don't know is if that is the larger amount (and thus the other envelope contains 30, and therefore x=30), or the smaller amount (and the other envelope contains 120, and therefore x=60).

The error is (in step 7) to calculate the arithmetic expectation of those absolute values, because they do not exist "together". The correct arithmetic mean can be obtained by considering the different conditions in which those values do exist, as described in [0]. However, the ratios do exist "together" - the other envelope contains either double or half of 60 - so we could instead calculate the geometric mean, of either the ratios or the corresponding absolute values, and obtain the correct result:

    (2 * 0.5) ** 0.5 = 1
    (120 * 30) ** 0.5 = 60
[0] https://en.wikipedia.org/wiki/Two_envelopes_problem#Other_si...

edit: changed “first envelope” to “chosen envelope” for clarity.


On further reflection, you're right, I'm conflating it.

My correction is still valid though. You're not handling step ten properly. You didn't work over the information sets, didn't solve the actual graph that is the game, didn't handle the under-specified policy function.

To try and show you that your solution isn't the actual solution: well, both options have the same EV. So I choose switch every time, because why not. As you are no doubt aware I never get to have EV because I'm constantly swapping. The sixty is a mirage. For my policy choice, the answer was undefined or zero depending on how you write it down. But you told me they had the same EV. So if they did, why did my choice not produce that EV? Ergo, your solution only appears to be giving you the EV.

Think about that for a while and you'll start to realize why I honed in on specifying a recurrence relationship with the terminal keep node and why I'm so eager to escape the trap of their flawed problem model.


Am I really the one who is getting things conflated? Go back to the problem and look at step ten. The problem allows infinite swapping. You are doing an EV calculation, but your analysis is fundamentally flawed. Assign the policy of always switching. Your analysis is claiming that EVs can be calculated, but they can't. The EV is either zero or undefined for switching, because the recurrence relationship is an infinite sequence. Since 60 != 0 and 60 != undefined, but you claim that they are, something is very wrong with your calculations. You're using the wrong formalism. The policy is supposed to be able to vary, but you're treating EV as a concept that isn't dependent on policy.

Lets take a step back and learn the important lesson for more complex situations. Your policy influences your expected value. Not keeping that in mind is going to destroy the ability to correctly calculate expected value. You aren't trying to search for the best thing to do on the basis of expected value. You are searching for the right policy to provoke a high expected value. The difference is subtle, but essential.

How do we correct it? Well, the right formalisms that allow you to search for the correct policy comes from several fields, but one of them is game theory. In game theory when dealing with imperfect information, it is considered incorrect to do state-based reasoning under imperfect information. This is because you aren't in a state - you are in an information set. When you are playing the game you have to consider every game you could be in, because you don't know which you are in.

This is a second problem with the analysis, but I think you corrected this one.

They ask to be able to translate this into more complex situation. So the general lesson here is about considering counterfactuals in your analysis. An example of this in practice is Jeff Bezo's talking about his decision to found Amazon on the basis of regret minimization on account of his theory about how he would feel in various counterfactual futures. He didn't consider one EV, the founding of Amazon, but also other EVs like founding Amazon and failing and also not founding Amazon and doing other things.

I think I get why you think I'm conflating A, but I'm actually trying to point out that the wikipedia article is conflating A and so its hard to have a productive discussion due to our inheritance of their misuse of terms. I don't want to conflate A, but the Wikipedia article defined A in their expected value calculation and in that equation it ends up taking on a different meaning to what it means when it is defined to be 60. And their meaning ends up claiming things like 1=2 in practice, because of the properties of the hidden counterfactual part of their equations - just because they neglect to show them, doesn't mean they don't exist in the correct mathematics. So the logical contradiction is there - which is exactly the thing the problem asks us to identify.


I find your exposition difficult to follow, perhaps you could write out your proposed solution without the accompanying explanatory text, so that we can clearly see how it resolves the paradox.


Okay. lets start with the easy analysis; the version of the game that terminate after only one switch.

I[null] = {0.5: A, 0.5: 2A}

We get the expected change in EV for switching like this:

    (
        # The expected gain of switching if we are in subgame A
        (2A-A) * 0.5
    
        +

        # The expected loss of switching if we are in subgame 2A
        (A-2A) * 0.5
    )
See how we had to consider two different subgames? We didn't know whether we had A or 2A. We only knew we had I[null].

You had to consider the benefit of switching from A to 2A and the cost of switching from 2A to A. You had to consider the factual reality you were in, but also the counterfactual reality you were not in.

Here is the reality of the first part of the game tree:

   ChanceNode(0.5)
  /           \ 
 A            2A
In a perfect information game, A and 2A are disconnected because they are in two different subtrees, but what watch what happens when we convert from the perfect information view to the imperfect information view of that world that we are actually dealing with:

        0.5
      /    \
    I[nil] I[nil]

We have two information sets now as the branches. One is actually A, but when we do reasoning about it we need to counterfactually consider it the other parts of the information set.

                      0.5
      /                                    \
    {Factually A, Counterfactually 2A}     {Factually 2A, Counterfactually A}
So I hope you're starting to realize that A and 2A are fundamentally connected. You can't reason about A without including 2A. You can't reason about 2A without including A. This is really really important to one of several reason that their analysis is wrong. You don't know which branch you are in. So you can't condition on being in A, like they do in the wikipedia article.

Look at the trouble they run into when they /do/ condition on A. I just showed you these counterfactuals exist, but when they condition they still exist. It treats the situation as if you can just deal with A and just deal with 2A. So they get a 2A case and a A/2 case both of which have their own counterfactuals. Notice what just happened when they did that.

In the A/2 case since we are in imperfect information there is a counterfactual associated with it. What is that counterfactual? It is A! So they don't just have A/2 they also have counterfactual A.

And now look at the other case. 2A has a counterfactual associated with it too. What is it? A.

So they have this:

[F: A, CF: 2A], [F: A/2 CF: A]

And what I'm trying to point out to you is that they just declared A=A/2 and 2A=A, because they neglected the counterfactual relationships.

You can't condition on A in the subgame; you don't have perfect information - you don't have A. You have I[null]. Even when you get told 60 you still have I[null].

/A/ isn't 60 and it can't be because you don't know which subgame you are in. A isn't given. If you knew A, if it was possible to know A, you wouldn't be in an imperfect information game. A is defined in point one yes - but it is also used in point seven and when it is used there we get the logical contradiction.


The paradox is supposed to be in the contradiction between this (incorrect) reasoning and the correct reasoning, not in the counterintuitive conclusion of the incorrect reasoning.


There actually is a paradox:

Select a power of 2 according to the probability

P(2^k) = 0.2 * 0.8^k

i.e. p(1) = 20%, p(2) = 16%, p(4) = 12.8%, etc.

Then place this amount in one envelope and twice that I'm the other envelope.

Then, randomly shuffle the envelopes and select one and open it. Say you see the number 16. It could have landed there in two ways: Either 16 was generated and the other envelope has 32. This happens with probability 0.2*0.8^4 ~ 8.192%. Alternatively 8 was generated and the envelope you have is already the largest one. This happens with probability 0.2*0.8^3 ~ 10.24%. Given that you see 16 already, the relative probability becomes 4/9 vs 5/9. In other words the chance that you have the largest envelope already is 5/9. The expected value of the other envelope is then 5/9*8 + 4/9*32 = 18.67 > 16 so you should switch.

In fact, try this for different initial envelopes and you'll see that you should always switch.

But, if you should always switch, why even look inside the envelope?

But, then you're back at the symmetry / no information argument.


This is similar to the St. Petersburg problem. The way I satisfy myself with that answer is to note that it's kind of a spherical cow problem. Nobody in the real world can afford to run this game, since it requires an infinite amount of money -- even a currency issuer would expect to just make their currency worthless by running it, and an offering of physical resources instead of currency would violate thermodynamics in all sorts of ways.

Spherical cow problems often end in counterintuitive results.


Note that the problem requires allowing arbitrarily small amounts of money in the envelopes, otherwise you would know at some point that you have the smallest amount of money possible. So in your formulation, you have to let k be negative as well.


No, k >= 0 in my example. In one case you know for sure that you have the smallest envelope since you see that it has $1 in it. I'm this case you should absolutely switch. In all other cases, you should still switch although you can't be completely sure.

That being said, even if you let k go negative I think it's still a paradox.

If you're curious the resolution to this is that before you look in the envelope the expected value of both envelopes is infinite. Only after you look inside one of them does that collapse to two finite values that you can subtract.


If you disallow negative values of k then you break the symmetry argument, and you do indeed get quite a bit of information by opening an envelope.

I assume you meant to use a exponential base <0.5 so that the mean is finite


The point I'm making is that if you open the envelope you will always conclude that it's advantageous to switch regardless of what you see in that envelope. In the case where you see $1, you're guaranteed that switching is a good idea. If you see anything other than $1 you come to the same conclusion but you have to rely on a probabilistic argument using expected values. Now, if you will decide to switch regardless of what you see in the envelope, then why even bother opening it? But, if you don't open the envelope, then you're back to the original problem where the symmetry argument is valid.


Every envelope you open will have a higher expected value than the previous one. This is not paradoxical, it is a standard property of probability distributions with infinite means.

I believe the problem you are trying to elicit is that you can almost show that either envelope is a priori better than the other. The problem with this is that the means are infinite so the math isn't sound. So there's not an actual contradiction; the strategy is to open either envelope and then switch to the other one. It's unintuitive but only because we're not used to reasoning about distributions with infinite means.


Right, I mentioned above that the resolution to this is that the expected value of both envelopes is infinite before you open anything and thus the difference is undefined. Once you open one, the difference becomes well defined.

I still think it's quite unintuitive that one needs to open an envelope in order to regularize things this way. The fact that you will always decide to switch, regardless of what you see in the first envelope, yet you still have to open it is what I find paradoxical.


So the real predictor is the probability.

Once the probability gets around 7/9*.5X + 2/9*4X, then switching is no longer attractive.


There's an amusing amount of probability puzzles with seemingly paradoxical solutions out there, to the point that it's almost more surprising to see one where the intuitive answer is correct:

- https://en.wikipedia.org/wiki/Boy_or_Girl_paradox

- https://en.wikipedia.org/wiki/Penney%27s_game

- Waiting time paradox

- Honorary mention: https://en.wikipedia.org/wiki/St._Petersburg_paradox

(feel free to add more)



> This is in contrast to the three-door problem where information is gained by opening the first door.

This is in contrast to the earlier problem statement of Gardiner's wealthy men playing against each other as per the article, actually.


I love this problem because it is so simple, and the false line of reasoning is so compelling that it would hardly raise an eyebrow if you saw it in an academic paper and yet the conclusion is so obviously wrong. Decision problems are tricky and in non intuitive ways.


Well, the simple way to put is that "A" isn't fixed. The "expected value" argument in steps 6-7 is acting like "A" is a single value when "A" will be larger or smaller depending on what envelope you picked.


Yeah, that's obvious when they say "one envelope contains 2*A and the other A/2". Well no, one doesn't contain one fourth of the other.


It doesn't actually say that. It says the other envelope will either contain 2A or A/2. The envelope you're holding is A. The other envelop is either twice that, or half that.


the "writing" doesn't say it but the written formula at bullet point 7 says it. That's the mistake, the formula is wrong and does not describe reality.


This is incorrect. Bullet point 7 in the written formula does not refer to a corresponding pair of envelopes, as you imply. Instead, it refers to "the other" envelope only: 1/2 chance that "the other" envelope contains 2A, and 1/2 chance that "the other" envelope contains A/2. Notice that both references reference the same object.


That precisely what's wrong, because A is not the same quantity in both "A" part of the equations.

The right formula is changing either the 0.5A by A or the 2A by A.

Try with A being a real like 100 usd numbers to see it.

Basically you can't have "two realities" in the same equation. U havé to pick one.


> The right formula is changing either the 0.5A by A or the 2A by A.

No it's not, that's not how you make EV calculations. As I already said in my previous post, the parts in the EV calculation do NOT refer to the pair of envelopes. Both parts refer to the same object, not to different objects. You are trying to change the EV calculation such that the different parts refer to different envelopes. This does not make any sense.

> Try with A being a real like 100 usd numbers to see it.

...and that's not a way to prove / demonstrate the correctness of math. Yes, if you change these numbers in this way, you get EV 0, which is the correct answer. Just because the answer is correct does not mean the calculation is correct. In this case it isn't.


My explanation is correct and I can't say much more to convince you that I've already said.. yours is wrong in precisely why people are confused with the problem and I'm kinda sorry you don't see it.

Since you seem to be making some authority arguments just know that I have a PhD in ML / stats and was top 30 in my country in olympiad level competitive maths so I DO know what I do and in this case with absolute certainty.

Look for the other guys answer if you want more details


> Since you seem to be making some authority arguments just know that I have a PhD in ML / stats and was top 30 in my country in olympiad level competitive maths so I DO know what I do and in this case with absolute certainty.

Let's review what you're claiming with absolute certainty. This is the first claim that I'm contesting:

> they say "one envelope contains 2A and the other A/2".

When pointed out that they actually don't say this, you clarified your point:

> the "writing" doesn't say it but the written formula at bullet point 7 says it. That's the mistake, the formula is wrong and does not describe reality.

The second part of that quote is correct: bullet point 7 has the mistake, the formula is wrong and does not describe reality.

The first part of that quote is incorrect: the written formula at bullet point 7 does not claim that one envelope contains 2A and the other envelope contains A/2. It claims that the first envelope contains A and the second envelope has a 50% probability of containing 2A and a 50% probability of containing A/2.

You correctly identify where the mistake is, but you misidentify what the mistake is.

If you read bullet point 6, it is very clear: "the other envelope contains 2A with probability 1/2 and A/2 with probability 1/2". Notice that bullet point 6 does not claim "One envelope contains 2A and the other envelope contains A/2".

To illustrate my point, consider a simple coin toss game. If the coin comes up heads, you win $2. If it comes up tails, you win $0.50. How might we construct the EV calculation for this coin toss?

0.5 * $2 + 0.5 * $0.50

Notice how both parts of the formula refer to the same coin. They don't refer to different coins. You have one coin that you flip, and depending on how that one coin lands, there is some probability that you get $2, and some probability that you get $0.50. If you looked at that formula, you wouldn't think that the first part refers to different coin that the second part.

Similarly, here we have one envelope (similar to having one coin) and we have uncertainty about the what the envelope contains (similar to having uncertainty about how the coin will land). We have 50% chance that the envelope will contain 2A, and we have 50% chance that the envelope will contain A/2. Thus, we arrive at the (incorrect) formula:

0.5 * 2A + 0.5 * A/2

It's very obvious that both parts of the formula refer to the same envelope, just like in the coin toss formula both parts of the formula refer to the same coin. To be extremely clear, I am not claiming that the formula is correct. I am claiming that both parts of the formula refer to the same envelope, not to different envelopes. To be specific, I am refuting this claim that you made:

> they say "one envelope contains 2*A and the other A/2".

They don't say that.

Furthermore, you make an additional incorrect claim:

> The right formula is changing either the 0.5A by A or the 2A by A.

Here you are just randomly changing the formula in order to make it output 0EV (which is the correct answer). Just because you get a correct answer does not mean your computation was correct. In this case it is incorrect, because A was defined badly. In order to fix the formula you would have to fix the definition of A. If you keep the incorrect definition of A and just shuffle symbols around until you get the correct result, your computation is still incorrect.


> The first part of that quote is incorrect: the written formula at bullet point 7 does not claim that one envelope contains 2A and the other envelope contains A/2. It claims that the first envelope contains A and the second envelope has a 50% probability of containing 2A and a 50% probability of containing A/2.

But it's wrong! Try with literally any A.

> If you read bullet point 6, it is very clear: "the other envelope contains 2A with probability 1/2 and A/2 with probability 1/2". Notice that bullet point 6 does not claim "One envelope contains 2A and the other envelope contains A/2".

Sure but this bullet point is also dead wrong and possibly the start of the scam. If you say A is lowest value, Either it contain A with probability 1/2, or 2A.

Otherwise you are changing the reality of A midway through the sentence.

To meet you halfway, what you may actually want, or see in your head is the possibility that A (in the lowest value sense) may vary in the experiment, and try to compute the expectation of that new problem.

But that's a different problem. And if you want to compute the expectation of this problem with A varying in a range you have to write an integral over the p(A) around the (corrected) equation 7. But the equation 7, even in this integral, need to use the same consistency for A: either you have A being the lowest envelope value and you only write 2A ever, or the highest value and then you only write 0.5A


> But it's wrong! Try with literally any A.

As I said, I agree that the formula is wrong. I disagree with you regarding where the mistake is and how to fix it.

> Sure but this bullet point is also dead wrong and possibly the start of the scam. If you say A is lowest value, Either it contain A with probability 1/2, or 2A.

I think maybe you have some typos here, because I don't understand what you're trying to say here. In any case our disagreement concerns whether the formula in Wikipedia refers to the same envelope in both parts of the formula, or different envelopes. You kept stating that it refers to different envelopes, and I wrote a long post to refute that. Now you came back saying that the bullet / formula / Wikipedia text is "wrong" and "possibly the start of the scam". Ok, sure, but the EV formula still clearly refers to the same envelope in both parts of the formula, despite the fact that you claimed otherwise with literally "absolute certainty". You were wrong about something that you claimed to know with "absolute certainty", so perhaps you should in the future adjust those estimates downwards with a bit of uncertainty added in?

> Otherwise you are changing the reality of A midway through the sentence.

Nope, A is defined as the "value of the firstly-chosen envelope" in the beginning of the sentence, and A is defined as the "value of the firstly-chosen envelope" in the end of the sentence. The definition for A does not change midway through the sentence.

> But that's a different problem. And if you want to compute the expectation of this problem with A varying in a range you have to write an integral over the p(A) around the (corrected) equation 7. But the equation 7, even in this integral, need to use the same consistency for A: either you have A being the lowest envelope value and you only write 2A ever, or the highest value and then you only write 0.5A

That's not the only way to compute the expectation for this problem. I actually ran some small simulations for this today, representing wagers on this problem. The simulations demonstrate the following claims:

- Steps 1 through 7 in the Wikipedia page are correct, in terms of calculating the "expected value in the other envelope, relative to the value in the firstly-chosen envelope". The expected value for the switch, relative to the firstly-chosen envelope's value, is in fact positive (+25%).

- At the same time, the "expected value for the switch in absolute terms" is zero.

- These claims do not contradict each other.

- The first error in Wikipedia's line of reasoning is step 8 that states "they stand to gain by swapping". This indicates that the player should try to maximize goal "expected value relative to firstly-chosen envelope", which is incorrect. The player should instead maximize goal "expected value in absolute terms".

If you disagree with any of these, let's formulate our disagreement in the form of a wager that can be simulated in code.


I don't think you're wrong exactly, but I just want to try and give you a different way of framing your disagreement.

A man walks to a house with a ladder. He explains that ladders are climbing devices. He tries to set it up against the house, killing plants that he sets it on. The ladder falls over.

One person watches this happen and correctly explains that the ladder is a climbing device and says many very true things about the intentions and proves the genuine thoughtfulness of the ladder users approach. He isn't wrong.

Another person looks at everything from a different perspective. They are a time traveler and they experience the world in reverse chronological order. They say the ladder isn't a climbing device - it is a thing which falls on ground. They say it is a flower killing device. They claim that when the person explains that the ladder was for climbing, they were wrong, because it isn't. They fundamentally disagree about what the ladder is, because they viewed the ladder from a different perspective. He isn't wrong.

They start arguing about what the ladder is. The first person is trying to argue that the ladder isn't a falling device, because it isn't defined to be so. The other person is arguing - this is very critical by the way - not that it is a falling device, but that the way it got used meant it was one and that this disagrees with the statements about what the ladders purpose was. Now, because they aren't noticing this distinction, they argue and they try to prove that the ladder isn't what the other person is claiming it is. Perhaps they think they are winning the argument by pointing out the contradiction, but pointing out the contradiction is agreement that the contradiction exists.

You are using a programmatic understanding of what is happening. You are having to proceed forward from statements in order to get meaning. But there are different paradigms in programming type resolution which might help you understand why you get such sharp disagreement. In programmatic terms you can actually have this backward flow too. b [unknown]; foo(int); foo(b) -> b [int] is implied. You can propagate the type backwards from the function call.

Critically, this is extremely common in math. Mathematics has identities. The relationships between operations flow along those identities. When you let the identities flow backwards, the ladder you two are discussing stops being a ladder. When they make the choice to use a known algorithm, getting the expected value for a subgame under imperfect information, we know the relationships which flow backward from a correct usage. We know what was 'supposed to' be there. They had an incorrect usage. Their ladder wasn't intended to be for falling. But we're experiencing the world in a backwards direction. So we don't know that yet, like you know it. We're treating it as it was used, not as it ought to have been used - flowing what it is backwards from what they mean in the equations, not forward from how they are defined.

So he started trying to explain his thinking to you by moving backward and you noticed that he was saying the ladder was a falling over device, when the ladder maker told you it was something else. And you are right. He isn't wrong so much as his understanding of what the terms are was decided by a different process. And he agrees that you are right that the terms are defined differently!

You aren't really disagreeing about as much as you think you are, because you and him both already agree that the ladder maker said the ladder was for climbing and he said the ladder was for falling over. When you find the contradictions, well, I'm not sure you are actually proving him wrong so much as you are agreeing with each other that a contradiction exists.


The person you are responding to is right actually. I drew in an ASCII art a diagram of the various game trees so it will be easier for you to see this:

On Wikipedia they created two trees one with 2A.

             -> Keep    -> 2A
          2A -> Switch  -> A
    0.5 ->
          A  -> Keep    -> A
             -> Switch  -> 2A
And another with 1/2A

                -> Keep    -> A
          A     -> Switch  -> A/2
    0.5 ->
          A/2   -> Keep    -> A/2
                -> Switch  -> A
Notice now that there is no game in which A is both 2A and A/2.

If you don't think this is the error, you're not thinking in imperfect information terms. You're thinking in perfect information terms. This sort of splitting action is allowed in perfect information games! In imperfect information games, you are in multiple subgames at the same time. A has to be the same A in both subgames, because you can't discriminate between which game you are in. So you're not allowed to do this. It is wrong!

This gets at the core of what the question is trying to make people notice. You can't do this with imperfect information. You can only do it with perfect information. You can't do it over information sets and strategy spaces. You can only do it over states and action spaces.

But they do it. And then they combine them back into the same analysis when they say 1/2(Game tree 1 switch case) + 1/2 (Game tree 2 keep case). And you can't do that, because game tree one depends on game tree ones keep case and game tree 2 depends on its switch case! You can't tell which you are in, so you are in both - not one or the other. You can't combine the two different subgames from two different trees like you could in perfect information! You have to stick with the same game tree.

And that is what the person you are mistakenly correcting is saying. Pick one or the other, you can't have both.

PS: There are other errors in the problem, like that when you solve it with an MDP or the Bellman equations you actually /do/ end up with 100% Switch having an expected value of zero. So if you spotted some other error, that doesn't mean that the person you are correcting is wrong in their correction. There are multiple issues with the way the wikipedia article approaches the conclusion.

PPS: This exact confusion that people are talking about is actually something that happened in practice in the solving of imperfect information games! IIRC the Pluribus paper author corrected a reasoning error of this type when he introduced sub game reach solving? I might be misremembering here, since its been a while since I listened to his talk, but my fuzzy memory suggests that confusion about subgames interacting with each other was the root cause of a failure in some of the poker research for subgame solving in a blueprint abstraction.


> I drew in an ASCII art a diagram of the various game trees so it will be easier for you to see this:

This game tree only makes sense if you change the definition of A. In particular, you need to change the definition to "A = 1/3 of the total amount of money in the envelopes". If you define A like this, then your game tree is correct, and it corresponds to the "simple resolution" described on the Wikipedia page (where they use x instead of A, for clarity). This is an unsatisfactory answer, because it provides an "alternate path" to the correct answer, without demonstrating which step went wrong in the "incorrect path" that leads to the wrong answer. It is not particularly difficult to construct these "alternate paths" that lead to the correct answer. The difficulty of the puzzle is in demonstrating what goes wrong in the incorrect path, as stated on the Wikipedia page:

The puzzle is to find the flaw in the very compelling line of reasoning above. This includes determining exactly why and under what conditions that step is not correct, in order to be sure not to make this mistake in a more complicated situation where the misstep may not be so obvious. In short, the problem is to solve the paradox. Thus, in particular, the puzzle is not solved by the very simple task of finding another way to calculate the probabilities that does not lead to a contradiction.

> Notice now that there is no game in which A is both 2A and A/2.

I have no idea what you are referring to here. I never claimed that A = 2A in any circumstance (there is nonzero amount of money in the envelopes).

> If you don't think this is the error, you're not thinking in imperfect information terms. You're thinking in perfect information terms. This sort of splitting action is allowed in perfect information games! In imperfect information games, you are in multiple subgames at the same time. A has to be the same A in both subgames, because you can't discriminate between which game you are in. So you're not allowed to do this. It is wrong!

I agree with you that the crux of the issue is in the definition of A. In particular, "mixing" "two different A's" within the same computation, even though they sort of mean different things. This is a fuzzy explanation that gives a feeling "something about this explains the issue", but it's not a complete explanation that precisely pinpoints what is wrong.

> This gets at the core of what the question is trying to make people notice. You can't do this with imperfect information. You can only do it with perfect information.

This is false. We can easily construct imperfect information games similar to this one, but tweak a few things in such a way that a "naively constructed EV calculation" yields the correct results.

> And that is what the person you are mistakenly correcting is saying. Pick one or the other, you can't have both.

Nope! I was very clear about which very specific claims I was refuting. They were unrelated to the topics you are bringing up here. I posted a long, clarifying comment to a sibling comment here, you can look that up if you want to refute specific claims.

> So if you spotted some other error, that doesn't mean that the person you are correcting is wrong in their correction. There are multiple issues with the way the wikipedia article approaches the conclusion.

I didn't say "I spotted error X in Wikipedia which explains Two Envelope problem", I said "I spotted error X in this other poster's Hacker News comment". Please read the very specific comments I made about very specific claims made by the person.


Basically when the wiki says 2A because less than and A/2 because greater than things are okay, but we're in dangerous territory. If we ever introduce uncertainty the imperfect information scenario we are in is going to force us to acknowledge that we don't know which game we are in. Which means if that happens we don't have just 2A, but a information set {2A, A}. And we don't just have A/2, but an information set {A/2, A}. In game theory if you have two subtrees and they share an information set than the information sets are equal to each other. You can't tell which subtree you are in, so the information set contains both subgames. You can think of one as factual and the other as counterfactual and vice versa, but the elements in that set need to be equal. When we get to step seven we do a calculation that reintroduces our uncertainty. We multiply by the probability of being in each case. However, we don't actually know we are in one case and we don't know we are in the other case. So combining these two situations leads to the counterfactual part of subgames emerging. So we have {A, 2A} and {A/2, A}. The sets aren't equal. We can choose some parts of this, like grabbing A and comparing it to 2A. So now we have an identity rule that just let us say 2A=A. 2=1. Our contradiction is found.

> This is an unsatisfactory answer, because it provides an "alternate path" to the correct answer, without demonstrating which step went wrong in the "incorrect path" that leads to the wrong answer.

Eh, I mean, I guess I'm cheating by claiming an interpretation they aren't trying to use. This is why I detest the "no true scottsman" setup to the question. We can create an identity that maps their wrong step into my formalism where what they did is wrong in the way I claim it is. Even though they didn't "try to do it" doesn't mean I can't see that they did do it. But apparently - even though game theory notation neatly avoids these pitfalls - we need to stick to the footgun notation. It just seems stupid to me. If you want to avoid the problem, use the notation that trivializes avoiding the error. I really like the other person's analogy to type errors, because it is such a similar idea to what I'm saying, but they just use a different part of math to assert it. There are ways you can just make this decision fail to typecheck. If you don't want to have these type of errors? Be stricter about your typing so at to prevent them.


> the counterfactual terms that get realized into the same position, but which weren't written out, claim that Z = 2Z and that Z = Z/2.

Nobody makes this claim, directly or indirectly. If you got this result by formalizing the plain English statement into a proof assistant, then the error was introduced during the step where you interpret the plain English into a formal statement. If the claim "Z = 2Z" was inherent to the plain English version of the problem, then it wouldn't be possible to run the wager as a simulation, but it is. The entire Two Envelopes problem - including step 7 - is possible to simulate in code: https://pastebin.com/jPyZVrkx

What the simulation demonstrates is:

- It's perfectly possible to define "expected value of the final envelope in relation to the value of the firstly-chosen envelope", as the Wikipedia page describes. No contradiction exists that would prevent this computation.

- It's also perfectly possible to define "expected value of the final envelope in absolute terms".

- These are different goals to maximize. A rational actor should maximize goal 2 ("expected value of the final envelope in absolute terms") in this variant of the game. It's also possible to construct another variant where a rational actor should NOT maximize goal 2, but should instead maximize goal 1 ("expected value of the final envelope in relation to the value of the firstly-chosen envelope").

- The error in Wikipedia's "compelling line of reasoning" appears to be at step 8 where they conclude that "they stand to gain by swapping". This implies that a rational actor should maximize goal 1, when in fact a rational actor should maximize goal 2 instead. This is the error in Wikipedia's line of reasoning.

> When we get to step seven we do a calculation that reintroduces our uncertainty. We multiply by the probability of being in each case. However, we don't actually know we are in one case and we don't know we are in the other case. So combining these two situations leads to the counterfactual part of subgames emerging. So we have {A, 2A} and {A/2, A}. The sets aren't equal. We can choose some parts of this, like grabbing A and comparing it to 2A. So now we have an identity rule that just let us say 2A=A. 2=1. Our contradiction is found.

I'm not a mathematician, so I don't understand expressions like "counterfactual part of subgames emerging" or "there is an identity implied by being under imperfect information". I appreciate you writing back at length, but the majority of what you wrote went straight over my head. The impression I have is that you formalized the problem in a manner that lead to a contradiction. If this contradiction was inherent to the original English problem statement, it wouldn't have been possible for me to simulate the problem. But it is. So it seems to me that there is no inherent contradiction in the English problem statement, it seems that the contradiction was introduced during the formalization step.

> I detest the "no true scottsman" setup to the question [...] If you want to avoid the problem, use the notation that trivializes avoiding the error.

I wouldn't describe the setup as a "no true scottsman" scenario, because the setup describes what counts as a scottsman: determine which step in the line of reasoning is incorrect, why and under what conditions. I appreciate that you tried to do this when you identified step 7 and explained what was wrong with it in your opinion. This is also what I tried to do in my explanation, with regards to step 8 and the comparison between the two goals.

Besides, it's very trivial to conclude that nothing can possibly be gained by switching (in absolute money terms, which should be the goal a rational actor chooses to maximize). It would be boring and unsatisfying to accept a simple answer that explains how to get the "correct answer" without explaining what made it a paradox in the first place. It would be akin to looking at an illusion that displays a man or a cliff depending on how you look at it, then shouting "this is not an illusion! it's just a cliff! there is no man in this picture!"


> It would be boring and unsatisfying to accept a simple answer that explains how to get the "correct answer" without explaining what made it a paradox in the first place

Well, this is just a place where they do the math wrong. We can quibble about why, but it isn't a paradox. We both agree they do it wrong and we even have an overlap in the reason why - we both think they aren't comparing them relative to all the subgames they are in and we both think that this is required to reason correctly. So lets say we get to the correct relative EV of 0.

If both are worth the same, well, why not always switch? They are the same EV, right?

IMO this is the real paradox. We have a false equivocation. EV(policy, game) is not equal to EV(envelopes), because for the policy `ALWAYS_SWITCH` we get the logical contradiction that undefined = EV(envelope) or if you take the limit with a discount under a different formalism that 0 = EV(envelope)

When you tackle this problem starting there rather than at the other error the entire algorithm changes, because we need to modify the problem so that we can calculate EV with respect to policy. The algorithm changes enough so that it really annoys me, because I have this intuitive feeling that I'm changing the solution so much so that I'm no longer working within the spirit of the puzzle.

That is why I feel like "no true scottsman"; to use a metaphor, they gave me a coloring book and told me to color something beautiful but that I had to stay in the lines, but the lines of are a skunk and I want to draw a sunrise. I don't want to correct just the one particular step. I hate the framework that forces me into the confusion by falsely implying that if I get the right envelope EV, I know my EV.


> If both are worth the same, well, why not always switch?

I feel like this is a detour that we can avoid by adding a small cost to switching.

> I hate the framework that forces me into the confusion by falsely implying that if I get the right envelope EV, I know my EV.

I love it. I love it in the same way I love the trick of a magician who fools me. Also, the answer of "two EVs" is particularly satisfying for me, because it resembles a similar confusion in poker tournaments where you need to account for chip EV and dollar EV separately.


> I feel like this is a detour that we can avoid by adding a small cost to switching.

I don't think its a detour we actually want to avoid; the implications are really fascinating and help us to understand how to solve games more generally.

I agree with adding cost as a way to do it. Actually, that is why I've been saying it is zero or undefined. You know about dynamic programming right? Well, one of the reasons it was invented is kind of related to what we're talking about right now. There are these things called the bellman equations. They can look like this when you think of things in terms of a markov decision process.

https://wikimedia.org/api/rest_v1/media/math/render/svg/0e35...

Since they're recursively defined, you can compute them more quickly by caching the computation back to front then by computing them front to back. Anyways, putting aside that bit of trivia, do you see the symbol that looks a bit like a y? When we have an infinite sequence like we would if we kept switching you can set that term to be some constant, for example 0.9999999999999. That lets you take the limit, because the infinite sequence is obviously going to converge to zero.

Check out the formula again and notice one of the things that it does which the wikipedia article doesn't. Do you see the symbol for pi? That is talking about the concept of a policy function. A strategy, that way you compute what the agent would get if they played the game, rather than the envelope contents. Under this formalism we've doing an argmax over the policy function for the equation in order to get the policy that has the highest expected value.

> I love it. I love it in the same way I love the trick of a magician who fools me.

If you're actually interested in this sort of thing, I suggest checking out the poker research papers by Noam Brown. He had a talk at NIPS 2017 where he won best paper award. In 2019 he was runner up for best scientific advancement of the year. His work is about applying game simplification to poker to create a bluneprint of the game which is simpler, solving the blueprint using a modified form of counterfactual regret minimization, and then refining the solution during actual play with reach subgame solving. You might have heard of his work, because he was involved in the entire AI now better than humans at poker thing even in no limit breakthrough. I have some belief, but not certainty, that his paper contains a correction of someone who made this style of error. He mentions in the research paper that there was a mistake in another paper where they didn't account for the way subgames influence each other. I see that as the problem here, but I haven't read the other paper, so I don't know if it was the same category of error. Find the paper he referenced, read through it, and see if you're tricked. It is potentially a kind of real world two envelope problem to see if your tools for reasoning about these things is actually helping you to avoid the error in more complicated situations - though, since you disagree with me that this is about handling of imperfect information (or more precisely, the counterfactuals - there is a variant on wikipedia where they removed the probabilites, but counterfactual reasoning still applied to resolve the paradox) maybe you won't see it as related errors.


> You know about dynamic programming right? Well, one of the reasons it was invented is kind of related to what we're talking about right now. There are these things called the bellman equations.

Yes, I've done dynamic programming in competitions. I'm not familiar with bellman equations, and the explanation you provided about convergence, policy functions, etc. went over my head, sorry.

> If you're actually interested in this sort of thing, I suggest checking out the poker research papers by Noam Brown [...] He mentions in the research paper that there was a mistake in another paper where they didn't account for the way subgames influence each other. I see that as the problem here, but I haven't read the other paper, so I don't know if it was the same category of error. Find the paper he referenced, read through it, and see if you're tricked. It is potentially a kind of real world two envelope problem to see if your tools for reasoning about these things is actually helping you to avoid the error in more complicated situations - though, since you disagree with me that this is about handling of imperfect information (or more precisely, the counterfactuals - there is a variant on wikipedia where they removed the probabilites, but counterfactual reasoning still applied to resolve the paradox) maybe you won't see it as related errors.

This sounds really interesting! It was a huge deal in the poker scene when the poker AI developed by Brown and Sandholm defeated pro human players. I read the first few pages of the paper now, but the paper becomes very math-heavy after that. I don't have the necessary background to understand the notation they use. That said, the "mistake" in the way "subgames influence each other" that you referenced, I suspect it was of a far simpler kind - the kind that Brown explains in chapter 2 which he concludes with:

This shows that a player’s optimal strategy in a subgame can depend on the strategies and outcomes in other parts of the game. Thus, one cannot solve a subgame using information about that subgame alone. This is the central challenge of imperfect-information games as opposed to perfect-information games.

Although Brown says "imperfect-information games" here, he actually means a specific type of imperfect-information games: the type where the opponent's strategy is not fixed. We're talking about games where your opponent can change their strategy in order to exploit weaknesses in your strategy. This property was a key requirement of the "coin toss" example game that he provided in chapter 2. If you modified the coin toss game such that the opponent's strategy was fixed, then the situation would change completely. Crucially, Two Envelopes Game is not one of those games where the "opponent" can adapt their strategy according to your strategy. The strategy of the opponent is fixed in Two Envelopes Game. That's why you can solve a single subgame in Two Envelopes game independently of other subgames, even though it's an imperfect-information game. If you are suspectful of this claim, we can verify it by simulations.


> the explanation you provided about convergence, policy functions, etc. went over my head, sorry.

I was agreeing with you when I talked about convergence. You said that adding a cost would solve the problem and I agree that it does. I think you were probably thinking of a literal cost like "one cent". If you choose a cost like that then switching an infinite number of times has an infinite cost. You could instead think of the cost as a fraction of your expectation, the cost is 1% of whatever you end up getting back. Now if you switch an infinite number of times you end up having a cost of zero. That might seem counterintuitive, but recall that 1/3 is .3333 repeating. So when you sum 1/3 + 1/3 + 1/3 you get .9999 repeating. Yet 1/3 + 1/3 + 1/3 is equal to one. Infinitely close to something else is basically being the thing you are infinitely close to. Even though we never get the reward we know that the fraction is becoming infinitely close to zero. People call it "converging" when we have an infinite sequence we can sum to a real value. We know before ever seeing the value that we'll be multiplying it by zero. So we can refactor the equation to be 0*ev(switch) and then take advantage of the identity of 0x=0 to declare the result to be 0. Thus, the calculation converges to zero.


> Although Brown says "imperfect-information games" here, he actually means a specific type of imperfect-information games: the type where the opponent's strategy is not fixed. We're talking about games where your opponent can change their strategy in order to exploit weaknesses in your strategy.

This isn't really the central problem of imperfect information. Consider that in a perfect information game, your opponent will also adjust their strategy to exploit weakness in your strategy. So if it was the central challenge, why does it happen in both? You can rule it out as what he was referring to, because it doesn't discriminate between the two types of games. The central challenge in imperfect information games is you have to play with respect to your information set, not the subgame you are in. So the policies and outcomes of other subgames influences the expected value of the subgame you are in. In perfect information, the only game in your information set is the subgame you are in. So you only have to play with respect to the subgame. That is what makes perfect information different from imperfect information. It discriminates between the two game types.


> This isn't really the central problem of imperfect information. Consider that in a perfect information game, your opponent will also adjust their strategy to exploit weakness in your strategy. So if it was the central challenge, why does it happen in both? You can rule it out as what he was referring to, because it doesn't discriminate between the two types of games.

The type of issue I was referring does not occur in perfect information games.

As a practical example, consider the concept of "balancing your range" in poker. If you play poker without doing that - if you play with a purely exploitative strategy where you are only trying to maximize your EV for each hand - then your strategy will be very easily exploitable by an adaptive opponent. You will frequently end up in river situations where your opponent can deduce whether you have a strong or weak hand, so they can fold to your strong hands and bluff you out of weak hands. In contrast, if you attempt to "balance your range" - that is, consider all the subgames - then you won't end up in these situations as badly. For example, when you make a particular river bet, your opponent might deduce that you have a strong hand 70% of the time and a bluff 30% of the time (as opposed to 100% and 0%).

This issue does not exist in perfect information games. Yes, my definition of "adjusting your strategy to your opponent's strategy" was overly broad to define this issue. But if you think about the poker example, where a poker player will might make a decision like "I need to bluff with this part of my range, because I need to support my strong hands [other subgames] by having some bluffs in my range in this situation" - you won't find a corresponding example from perfect information games like chess. This class of problems is unique to imperfect-information games in which an opponent is allowed to adapt their strategy to yours. If you fix the opponent's strategy, the issue disappears. If you turn the game into a perfect-information game, the issue disappears. Both requirements must be present for this issue to exist.

I thought that Noam Brown was talking about this issue in chapter 2. He discussed a simple coin toss game where one player took a strategy, and then the other player adapted by taking the optimal (exploitative) strategy against them. Then the other player changed their strategy, and the other player again adapted their strategy. And then he described a balanced (GTO) strategy. Then he said this as a conclusion:

> This shows that a player’s optimal strategy in a subgame can depend on the strategies and outcomes in other parts of the game. Thus, one cannot solve a subgame using information about that subgame alone. This is the central challenge of imperfect-information games as opposed to perfect-information games.

I thought that this corresponds perfectly to my poker example. If it doesn't, and it means something completely different, ok, sure. I'm not a mathematician. I can't even read the notation that's used in subsequent part of the paper.


> I thought that this corresponds perfectly to my poker example. If it doesn't, and it means something completely different, ok, sure. I'm not a mathematician. I can't even read the notation that's used in subsequent part of the paper.

I think you understood his point very well. I just think you're making a mistake in trying to recast his claim from "imperfect information games" have this property to "this narrow subset of imperfect information games" has this property. Both imperfect games with an opponent and imperfect games without an opponent have the property of needing to play as if you are in multiple subgames, because the definition of the problem is that you don't know which subgame you are in. His claim was that the policy in one subgame could influence the EV of another subgame - and in this problem, it does. I believe you're thinking of the EV of the envelope when you think you are proving this game doesn't have that property via calculation.

To see his claim applies to this game consider the case where P(Switch)=1. Being able to calculate the EV of the envelope's contents is a bit different from solving the subgame. Here, we have two subgames E1 and E2. We can know a priori what the expected values of envelope's contents in E1 and E2 are. But if you change your policy in E1, it changes your expected value in E2. If you doubt this, remember the core of the paradox again - choose to always switch and your EV is no longer the EV of the envelopes. Ergo, the EV of the subgame E1 is dependent on the strategies and policies of another subgame, E2.


> I think you understood his point very well. I just think you're making a mistake in trying to recast his claim from "imperfect information games" have this property to "this narrow subset of imperfect information games" has this property. Both imperfect games with an opponent and imperfect games without an opponent have the property of needing to play as if you are in multiple subgames, because the definition of the problem is that you don't know which subgame you are in.

If we take the poker example and we modify it by "locking" our opponent's strategy, then this property is removed. Suddenly the optimal strategy for us no longer includes any GTO-like thinking such as "balancing our range", we should simply maximize our EV for each hand "in a vacuum" without any thought to other subgames. We no longer care how our range looks to our opponent, because they are no longer able to change their strategy.

> To see his claim applies to this game consider the case where P(Switch)=1. The policy you chose in one subgame just changed the EV of another subgame.

Sorry, but I don't understand this. This sounds to me like we fix the probability of switching to 1, which means that we end up in infinite loop and the outcome can not be computed. I don't understand this premise, nor its implications for the EV of the other subgame.

> This is a subtle distinction that I mentioned earlier - the EV of the envelope as in the wikipedia problem isn't the EV of the subgame. So you're not solving the subgame if you figure out the EV of the envelope.

The expression you use "EV of the envelope" is ambiguous in this context. I'm not sure if you mean EV relative to the value of the total amount of money in the game, or if you mean EV relative to the value of the firstly-chosen envelope.

When we're talking about "solving a game", we're talking about finding the optimal decisions within a game to maximize the expected value from the game as a whole. In the Two Envelope game we have just one decision: switch or not. So we need to find out if EV(switch) > EV(stay), where both EVs are relative to the total amount of money in the game (not relative to the value of firstly-chosen envelope). We have many ways to conclude that both of these actions have expected value zero. We don't have to be able to compute the "EV of the envelope". We only need to know the relative difference between the EV of these 2 actions. There's many ways of computing them, and all of those ways lead us to the conclusion that the EV of both actions is zero. I'm not aware of any "incorrect" way of computing those EVs such that we would get a nonzero result.


> without any thought to other subgames

I guarantee you that your solution is going to have to incorporate a set somewhere that includes both subgames. You might forget it does, because you simplify to a scalar, but it is going to be there. In perfect information, it isn't there. In imperfect information it is.


> That's why you can solve a single subgame in Two Envelopes game independently of other subgames, even though it's an imperfect-information game.

Obviously the EV of envelope 1 is the contents of envelope 1. If you know the probability of reaching it you can calculate the expected value of that envelope by multiplying by the probability of reaching it. But why are you multiplying by 1/2? Probability is defined in terms of sets. What are the set that makes it 1/2? Does that set contain only the subgames that are part of the subgame you are in?


> I'm not a mathematician, so I don't understand expressions like "counterfactual part of subgames emerging" or "there is an identity implied by being under imperfect information". I appreciate you writing back at length, but the majority of what you wrote went straight over my head.

# Defining Counterfactuals

Consider a fair coin flip. You have {Heads, Tails}. Lets assume you are going to get heads, take it as a given - that is what actually happens. It actually happens. It is factual. However, for the purposes of analysis, sometimes it doesn't really matter that we know what happened. We need to consider all the cases that didn't happen. Tails in our analysis would be the counterfactual. These two situations, the factual heads and the counterfactual tails, they're associated with each other. There is a set {HT} that contains both of them. E.g P(head) = |{H}|/|{H, T}|.

When I'm saying counterfactual I'm referring to the events that we didn't assume to be factual, but which we want to keep track of. Probability kind of drops these terms when it says "assume" because it says |{H}|/|{H, T}| becomes |{H}|/|{H}| which is equal to one. This is perfectly fine from a math perspective. The thing is that just like saying 5/x = y lets us move to 5=xy is valid, it has some assumptions built into it. Namely that x can't be zero. Our counterfactuals are a bit like the x, because they offer an easily hidden constraint on what it is valid to do. If you reintroduce uncertainty about whether or not you are in H, you have to do so in a way that takes you back to having the set {H, T}.

Let me show you a practical example of that to make the point very clear:

Let the value of heads be one and the value of tails be zero. P(Head) = 0.5 But assume heads on the same coin flip. P(Head|assumptions) = 1.0 But assume heads again on the same coin flip. P(Head|assumptions) = 1.0. Now since P(Head) is actually true with 1/2 probability: Ev(Head) = 1/2*P(Head|assumption) + 1/2P(Head|assumption)?

Well, it is certainty true that 1/2P(Head|assumption) is a correct term. So you can't say this is wrong on the basis of the assumptions alone. Being very precise, the problem is the neglected counterfactual wasn't handled. Every term leading up to the equation was technically true, but obviously we just did something really weird right? And to be very precise, we neglected the counterfactual associated with heads.

# Defining subgames

Most games can be written out as a game tree. I showed one in my previous post. When you move down the tree, you are in a subgame of that tree.

In a perfect information game like chess if you are in a subgame, a portion lower on the tree, the other parts of the tree don't matter anymore. Your results are independent of the rest of the game tree.

In imperfect information though, just because you are in a subtree doesn't mean you know which subtree you are in. Your actual view into the game is through the information you have. Just like in the {HT} case you have to consider the potential that you have both H or T, in a subgame you have to consider the potential you are in every subgame that is reachable given the information you've seen.

Check out this to get a better sense of what I'm talking about: https://www.youtube.com/watch?v=EbKmZLp5HvA


> This game tree only makes sense if you change the definition of A.

Switching away from using A to avoid ambiguity.

> This is a fuzzy explanation that gives a feeling "something about this explains the issue", but it's not a complete explanation that precisely pinpoints what is wrong.

Yes it does, actually. Well, to me. I'll go more in depth so it less fuzzy for you.

The issue is that when you handle imperfect information correctly you have counterfactuals implied by your state-oriented reasoning, because you can't actually escape into perfect information world - you're not in it. When you combine the state-derived probabilities the - I don't know the word for this, so forgive me, but the - superposition of the two counterfactuals implied by those state-derived probabilities end up occupying the same space. They exist, because they are implied by the imperfect information game. They just aren't writing them out in the equations on wikipedia.

Edit: My 'spatial' probabilistic reasoning in my head is explained here, but I show you the identity rule that proves this in the next section.

> I never claimed that A = 2A

You aren't, but the counterfactual terms that get realized into the same position, but which weren't written out, claim that Z = 2Z and that Z = Z/2. The two cases were under two different games. So their implied counterfactual components don't agree with each other and we should never have been allowed to combine them.

We agree on this!

You're just more focused on the they come from different games part and not noticing the other part of what I'm telling you: you can equate the counterfactual components that they don't show between the two games, because there is an identity implied by being under imperfect information. Look at the set relationship closely across the two subgames and you'll realize that the set is defined to be equal to itself across all subgames where the set is used.

That is quite literally one of the logical contradictions. This is where they claim that 2=1. The sets aren't equal to each other after their operations. So by the identity implied by the set, they've claimed 2=1.

And yet when I try to claim this is a core aspect you throw out this...

> This is false. We can easily construct imperfect information games similar to this one, but tweak a few things in such a way that a "naively constructed EV calculation" yields the correct results.

This is a really dumb nitpick with very poor support. If you're going to be this fallacious, you might as well go all out. Why not say I'm wrong because first graders learning to add numbers aren't learning how to handle imperfect information subgames? After all, your argument for me being wrong is that the pedagogical value of problems is independent of the subproblems they contain.

> Nope! I was very clear about which very specific claims I was refuting.

You have a more subtle point than I thought you were making. I'll check out your other post.

> I didn't say "I spotted error X in Wikipedia which explains Two Envelope problem", I said "I spotted error X in this other poster's Hacker News comment". Please read the very specific comments I made about very specific claims made by the person.

Well, he does have another error; EV isn't defined over all policy choices. It is a problem on many of the wikipedia solutions too. He does see it - its why he mentioned other people talking about infinite series - but his correction attempt was more like "halt here because of type error" which is something that I as a programmer can respect.


Where does it say that?


Formula at the bullet point 7


I don't think that this is so obvious to the layman, when A has already been defined as "the amount of money in the envelope I am holding".


Of course it's not obvious, that's why it's a "problem".

But if only look at the formulat at bullet point 7 there "2A" and "0.5A" as amounts but not "A". It just doesn't describe the current state of reality

not trying to brag, just reformulating the simplest way for everyone to see


This may be progress, but I don't see how it eliminates the "paradox". The claim in the "paradox" answer is that when you switch envelopes, "A" is not one of the possible outcomes. If "A" isn't a possible outcome then it shouldn't show up in the expectation value.


Because A has just not the same value in both terms of the equation. In the left it means "the lowest" and in the other "the highest".

It's two versions of the reality in the same equation, which is wrong.

Use 100 usd / 200 usd instead of A and 2A and you'll see. If you never switch and if you always switch in both case the equation is

100×0.5 + 200x0.5


I agree that's true once we know that the two envelopes have 100 and 200 dollars. For me the tricky part is, how do we know it's 100 and 200? Maybe we have two envelopes with 100 and 50 dollars instead.


Doesn't matter. Define A = 'the lowest' and rewrite the equations you will see the same expectation.

The problem is that they interchange A being the lowest and the highest in the same equation.


I think that subtly misses the point. The problem is that you're implicitly using a distribution that... isn't a distribution. And with this particular not-a-distribution, whether you should switch or not does not depend on the value of A. But with any actual distribution (... I think?) it does, at which point... no paradox. It's true that it's not clear (at least to me, but perhaps more generally) what distribution we should assume, but as long as we avoid treating something that isn't a distribution as if it were one we avoid the worst of it.


No, joe_the_user is exactly correct. The expectation they compute in step 7 requires that A is a constant. The formula seems to just be the regular formula for expectation: Value that B can take times probability that it takes that value, summed over all possible values.

But the "possible values" they put in are A/2 and 2A, which makes no sense. A is a random variable.

Don't forget that random variables are really functions on a probability space with an implicit argument, so (1) when multiple random variables appear in a formula, they're implicitly evaluated at the same argument omega, and the formula must hold for all omega from the probability space, and more importantly, (2) there can't be free-standing omegas in a formula for expectation because expectation is an integral (or sum in this case where the probability space is discrete) over all possible omegas.


This comment explains the fundamental reason why the reasoning is incorrect. I offer the same perspective in a different comment (https://news.ycombinator.com/item?id=31569991).

In my opinion, the Wikipedia article is making a disservice to readers by mentioning unnecessarily complex mathematical arguments involving bayesian reasoning and infinite distributions. I believe all of these are distractions from the more fundamental typing error that invalidates the switching argument.


They also fail the article's own premise

> in particular, the puzzle is not solved by the very simple task of finding another way to calculate the probabilities that does not lead to a contradiction


Honestly, the more I've thought about this problem the more that statement bothers me. It is just such a nonsense goal.

The reason the calculations are wrong are fundamentally related to why correcting the calculations leads to the correct probabilities. If you understand why the problem gives you the wrong result, you proceed to correcting it, and you get the right result - that doesn't mean you didn't understand. It means you did. You can tell you did, because you have the correct answer.

But if I show the calculations produce the correct answers, well, that is a step too far. We can now dismiss the understanding on the basis that it got the correct answer, apparently?

It's no wonder the author of the wiki writes such a bullshit claim - that no one can agree on a definitive conclusion. Their terms of engagement are self-defeating. Correctness is error, because correctness means you didn't /really/ understand. Its a no true scottsman fallacy - get the right answer, and you aren't engaging with the 'real' problem.

But its a flawed no true scottsman, because it defines something measurable: it tells us the goal, that it is to avoid this type of mistake in our thinking. And generalized algorithms for solving decisions problems that include this as a case which is successfully solved are many - and they just so happen to be in the imperfect information setting. And that setting has studied the problem of infinite recursion and has solutions for them. Which I can apply. To get the right answer. And, in general, avoid the problems they claim we aim to avoid.

More importantly and to the point of this thread - this isn't an intractable debate, because its been solved and used in production settings for literally decades. So what if some people are going to pretend it isn't? This isn't an unsolved problem. The actual game theory math is /well/ beyond this level of complexity. Its contending with things like environments where you have so much complexity you have to reduce to a blueprint abstraction, not stumbling at a decision problem that is quite literally simpler than rock paper scissors.


You seem to misunderstand the point of that constraint. Correction is not an error or a "step too far," it's just insufficient.

You can arrive at the correct conclusion either by pinpointing exactly where the original argument is incorrect, or you can come up with a completely different argument that does not have an error.

The puzzle challenges you to pinpoint the error because coming up with the correct solution is trivial (and the puzzle is deliberately set up this way).

This is not a "nonsense goal." If this came up in real mathematical research - two papers coming to contradictory conclusions - and no one could find where either paper went wrong, we would have a real paradox on our hands.


I think this, because I disagree with most people here about what the actual paradox is.

I think the paradox is that the algorithm equates the expected value of the contents of an envelope with the expected value of a policy choice for a player. When I correct what I feel is the root of the paradox, my solution drastically differs in fundamental ways such that the way the problem restricts to pointing out the wrong step feels disingenuous.

The entire structure is wrong, because even if you do correct the error that leads to the wrong EV for the envelope, you still haven't resolved the paradox. The right probabilities don't resolve the paradox, because they still imply that always switching has the same EV as not switching. If they were really equal, I could always choose switch, but I can't - so the paradox is still there.

My resolution ends up being so critical of their argument that the entire way they go about solving gets thrown out. I end up seeing, not just a specific wrong EV calculation, but a decision problem that is just fundamentally using an inappropriate algorithm to determine the policy function.


With all due respect ... this is basic probability theory. It's not really controversial what the solution is. The article's failures are mainly pedagogical.

We can agree that "the entire structure is wrong" because the "entire structure" is giving a wrong formula for the EV and saying "this is the formula for the EV."

Yes, switching and not switching have the same EV and you can always switch.


> We can agree that "the entire structure is wrong" because the "entire structure" is giving a wrong formula for the EV and saying "this is the formula for the EV."

We aren't in agreement about this. I realize we have to fix this, but fixing it doesn't resolve the paradox. It is a red herring.

> Yes, switching and not switching have the same EV and you can always switch.

With all due respect, this isn't true and asserting this doesn't resolve the paradox. See my other reply for why it doesn't resolve the paradox.


Mike, even when you get the correct result using the right rules the expected value of an envelope is not the expected value of a policy. I'm not agreeing with you that the problem is the expected value calculation. I'm telling you the problem is deeper than that. Fix the expected value calculation and you still have a paradox, because you are making a decision not on your expected value but on the expected value of the envelope. These are two different things and a rational person shouldn't eliminate the dependency between their policy and the reward they get.

To try and stress to you how big a problem this is, pretend you were playing the game and you wanted to find the right move that was going to make your EV the highest. Now, if you go with the logic of the problem, you are allowed to select not on the basis of your EV, but on the basis of some other EV. So instead of choosing the envelope, why not choose something else that has no relationship with our EV? Say, the weather in Alaska. If it is sunny, we like sunny. So switch. If it isn't sunny, we don't like that. So keep. It is crazy to do this, because there is no relationship between the EV you are using as a selection criteria and the EV your policy gets. This is the same situation as using the EV of the envelope. It sounds really crazy when you use the Alaska example, because it is so obviously unrelated. It sound so reasonable when you use the envelope example, because it isn't as obvious that they are unrelated. Yet for the policy of P(switch)=1, the ev of the envelope and the ev of the policy with respect to the game are not the same thing.

Now imagine the wikipedia article for the Alaska problem variant of the two envelope problem. Do you really think everyone would be so focused on the EV of the envelope as the step that was wrong? How could they? We have the same paradox still, but there is no EV calculation for the envelope included in the problem. If we can remove the EV calculation, yet still have the same paradox, it seems to me the paradox is not the expected value calculation.

So what is my solution? Well, to actually find your correct policy function you need to get the argmax of the policy with respect to the game. There are multiple ways to do this:

- Reinforcement learning does it by finding argmax pi with respect to Q_pi(s, a) = R(s') + P(keep)Q_pi(s',keep) + P(switch) Q_pi(s', switch).

- Game theory sets it up a bit differently. You define a similar graph using a different formalism, but simplified to operate over information sets. You can use a thing called regret matching; basically it turns out that if you play in proportion your normalized counterfactual regret, the average of those policies is the optimal best response.

In both cases you need to do something about the fact you're actually on an infinite graph. So the actual solution in the general case looks very very different from their way of solving the problem. It isn't just simple probability; I mean, it is, but taking the limit of an infinite sequence and taking advantage of the properties of markov chains aren't usually what I think of when someone tells me that something is simple probability. That is one formalism. In the other, we do have simple probability, but it isn't necessarily obvious that the central limit theorem gives us the optimal policy when we play in proportion to not just our regret, but our counterfactual regret. So yes, simple probability, but also, most people who know simple probability don't necessarily even know what a counterfactual is. So maybe not that simple after all.

But lets say we stick to the problem. We are here to learn how to avoid this problem, right? Nope. If you don't do things like this, you'll just be wrong in more complicated situations. Because the EV of the envelope is not the EV of the game with respect to your policy. This gets increasingly true as your imperfect information games get more complicated; it is very true of complex real world situations. The value of a wallet with a hundred dollars in the real world is different depending on whether you got that wallet with a policy function of robbing people versus earning it at your work. I feel sticking to their formalism means you end up conceited with regard to your ability to protect yourself from this paradox, because you consider yourself a master of the expected value of the envelope, but you're still vulnerable to the paradox, because the expected value of the envelope isn't the expected value of the game with your policy. So sticking with the problem is the opposite of protecting yourself.

I'm so far from what they want the problem to focus on, but they are wrong to focus on that. They aren't protecting themselves from making the same mistake. They're dooming themselves to use the wrong tools for solving this problem.. So they will make this mistake and they'll even be more confident in themselves as they do it, because they were clever and did the wrong thing in a better way, calculating the EV correctly, but staying within the land of paradox despite that.


Well taking the mean of products doesn’t make much sense anyway… but if you take a log and average then you end up with the typical solution. If you take a geometric mean then you end up with 1, which implies there’s no difference either.


Yes.

The big problem is that you can't have a uniform distribution on the natural numbers.

(And by extension, you can't have a uniform distribution on the rounded-to-integer version of a distribution on the real numbers.)


Who said there is a uniform distribution on the natural numbers?


I think the assumption that "for all x, p(x) = p(2x)" implies a uniform distribution on the natural numbers.

Imagine we have a distribution that satisfies that assumption, and then someone tells you they've sampled from that distribution and found that the result is of the form (say) 7*2^k, for some k > 0. That conditional distribution for k would seem to have to be uniform, right?


We're not dealing with the natural numbers, right? Otherwise, you obviously want to open the envelope and always switch when you see an odd number.

Of course, the lack of a uniform distribution on the reals or rationals presents the same problem.


> We're not dealing with the natural numbers, right?

Well, you can either assume that the envelope's numbers are specified only up to pennies. Or you can just round and say you don't care about sub-pennies.

Or you can go with your suggestion, and look at the lack of a uniform distribution over the reals / rationals.


This reminds me of how chess tactics feel so obvious when I’m on a tactics trainer, yet I can’t identify them in games where they actually happen.

Or code that “obviously” has a bug only after it caused an issue in production.

Would be nice if someone warned me before I do a PR: “ok, fyi there is a subtle but devastating bug in the new_feature.cpp”


Well if debugging is the process of removing bugs from code, then logically the act of writing the code must be called "bugging" and the probability of having a bug in any new piece of code asymptotically approaches 1 with increasing SLOC ;^)


Every program has bugs, every program has inefficiencies. Therefore every program can be eventually reduced down to a single instruction, which will be incorrect.


> Every program has bugs

Where is the bug in this programm?

  END


What’s the specification?


It was supposed to return a generated image of Reverend Thomas Bayes in the style of stained glass.


This is a class of "hard to find, easy to verify" solutions.


> Or code that “obviously” has a bug only after it caused an issue in production.

Cognitively, the mind filters out quite a bit of information for performance reasons.

Are we biased in favor of reasoning that "sounds plausible"?

My "hot dice" say "yes".


Step 3 (and thus also 1 and 2 on reflection) stands out to me immediately as describing two different situations for the entire game.

The case where switching gives you 2A and the case where switching gives you A/2 describe 2 completely different universes, not two different actions in one.


Agreed. Step 6 is where it breaks down, because it's redefined A as being one of the two envelopes to being the halfway point between those two envelopes. Until step 6, it's describing two parallel views of the universe (depending on which envelope you first get - hence steps 4 and 5), and then mashes them together in a way that doesn't work.

Take the practical example of $100 and $200: A is either $100 or $200 depending on which envelope you receive first, so the equation is either 0.5*A + 0.5*2A, or it is 0.5*A + 0.5*(A/2). It can never be 0.5*2A + 0.5*(A/2)


Even after looking at those numbers, it still felt wrong to me. It eventually occurred to me that this was because my intuition was telling me the total should add up to A. And it doesn't; it adds up to 9/8 A. It took me a while to realize that my intuition was wrong, and that there's no reason the total should be A.

The thing that helped me reconcile this was realizing that the A in the two equations are different values; then replacing them.

    (0.5⋅(0.5⋅A + 0.5⋅2.A)) + (0.5⋅(0.5⋅A + 0.5⋅0.5⋅A)) = 9/8 A
^ but the A in the left grouping (where it's 100) is different than the A in the right grouping (where it's 200).

Replacing the As with their actual values

    0.5⋅(0.5⋅100+0.5⋅2⋅100)+0.5⋅(0.5⋅200+0.5⋅0.5⋅200) = 150
And, since one envelope has 100 and the other has 200, an expected outcome across both envelopes is, as calculated, 150.


The breakdown for me was in step 4-5.

> If A is the smaller amount, then the other envelope contains 2A. > If A is the larger amount, then the other envelope contains A/2.

These are conditional probabilities so you can't simply add them up and compute an expected value of switching like they show in the example.


This line of reasoning isn't possible if you have specific amounts (eg £50 and £100) for the envelopes, which suggests that the variable A is being misused somehow.

`B = (2A if A=50, A/2 if A=100)`. Simplifying this to `B = 2A or A/2` loses important information: namely that when B is smaller, you expect A to be larger. Or alternatively, treating A as fixed (say A=100) conflates two different situations: one where the amounts are `100, 200` and one where they are `100, 50`. So you end up thinking B must have (200+50)/2 in it, which is incoherent. If you compute the expected B correctly then `A` is different in each branch.

It is subtle though. If this appeared in a paper, arguing something not obviously wrong, it'd be hard to convince everyone that the reasoning is bad.


It depends on how the envelopes are prepared. If the two envelopes have x and 2x in them and one is randomly handed to you, there's a symmetry between them that tells you there is no benefit in switching.

However, suppose x is placed in an envelope and handed to you. Then a fair coin is flipped. If it comes up heads, 2x is placed in another envelope and if it comes up tail, x/2 is placed in that envelope. Then you should switch.

It's a subtle difference in how the envelopes are prepared, but it makes all the difference.


Oh, weird.

I’m not sure I’d have naively realized that the fixed amount in your envelope makes so much difference.

For people confused like me, think about the outcomes:

When the two are prepared together, say $2 and $4, then when you exchange your options are +2 and -2 with 50:50 odds.

When the second envelope is prepared based on $2 in your envelope, then when you exchange your options are +2 and -1 with 50:50 odds.

(I had to diagram this all out; unintuitive to me!)


I did a pretty deep analysis of this problem a while back here in case you're interested: https://mindbowling.wordpress.com/2020/09/14/two-envelope-pa...


In some ways it's similar to the Monty Hall problem: https://en.wikipedia.org/wiki/Monty_Hall_problem

The introduction of a second "stage" based on your action changes things in a non-obvious way. Your first action is less impactful than your second one, regardless of what you did.


This is new information which changes the definition of the problem though.


Yeah, I agree. This setup though is the one to which the original "naive" argument applies which is why I think it's interesting to ponder.


This is the “simple resolution” presented in the linked article. What “line of reasoning” are you responding to?


Right. I mean the original, paradoxical line of reasoning – "the switching argument".

I see that the article also says "commonly one writer proposes a solution to the problem as stated, after which another writer shows that altering the problem slightly revives the paradox." But it doesn't seem to elaborate on where the simple resolution falls down, if it does.


The correct answer is described here:

https://www.youtube.com/watch?v=_NGPncypY68

TL;DR (Spoiler alert): the expected value of the amount of money you end up with is an infinite series whose sum changes depending on the order in which you add up the terms, and so you can choose an order that makes this value come out to be positive, negative, or zero.


I do not like this explanation. In my opinion, the fancy mathematical argument involving infinite series is an unnecessary distraction from a much more fundamental and mundane mistake (i.e. incorrectly using a variable out of its scope).

I believe mike_hock provides the clearest and simplest explanation in this thread: https://news.ycombinator.com/item?id=31567251

I also made this argument separately: https://news.ycombinator.com/item?id=31569991


There's nothing wrong with conditional expectation. E[X] = E[E[X | A]], where E[X | A] is a function of the random variable A. See https://en.wikipedia.org/wiki/Conditional_expectation.

When E[X], E[A], and E[X - A] are all well-defined, it is indeed the case that if E[X | A = a] > a for every particular value a, then E[X] > E[A].

What goes awry in this case is that E[X - A] is not well-defined (where X is the value in the unselected envelope and A is the value in the selected envelope). It is given by a conditionally convergent series, as noted.


Right away I was taken aback by the wikipedia article’s quick dive into mathematical solutions to the problem, when it seemed to me that what would be missing is consideration/framing of the initial setup.

That initial setup being, the act of starting the game with someone is(?) confers value in and of itself.

So I’m glad the real solution is wonky.


Welcome to Wikipedia. Where people who love using equation editors outnumber those predisposed to writing cogent explanations.


What an interesting trick. A starting point to intuit a resolution: yes you have a 50% chance of doubling and a 50% of halving. But the doubling only happens if you have x dollars, and the halving only happens if you have 2x. So you can see you either gain x or lose x, so your expected return is 0.

When the amount you multiply your money by depends on the amount of money you currently have, you have to factor in your initial money to each case to compute expected return.


Oh, that actually really helped me getting a grasp on the problem. I feel like other comments and sections of the article were trying to get at that idea, but your phrasing of it made it finally click for me. Thanks.


This is a far better explanation than the confused hand waving in the Wikipedia article.


I would argue that this Wikipedia article is misleading and that it confuses more than it clarifies when it comes to resolving the paradox.

In particular, I am disputing the fact that "no proposed solution is widely accepted as definitive" (exact quote). Indeed, the switching argument (or at least the argument as it is presented in the Wikipedia article) makes a clear and precise mistake that I claim any trained mathematician will immediately agree on once pointed to it.

This mistake is explained in the following comments: https://news.ycombinator.com/item?id=31566226, https://news.ycombinator.com/item?id=31567251 and https://news.ycombinator.com/item?id=31569991.

Once you have identified it, this mistake is all but subtle. It is not about infinite series or Bayesian reasoning. It does not require a 30 minutes rebuttal talk. It is simply about misusing the very definition of an expected value in a way that could be qualified as a typing error (see linked comments). This error can only go undetected because of the ambiguity of the English language. Most of the fancy mathematical discussions I've been reading are distractions. This paradox is about language and not about any deep mathematical fact of probability theory.


This is not the issue. When expected value is well-defined, it is perfectly fair to compute E[X] = E[E[X | A]], where E[X | A] is a function of random variable A ("conditional expectation"). If E[X | A] > A for every particular value of A, then E[X] > E[A], whenever the expected values E[X], E[A], and E[X - A] are all well-defined.

The issue in this case is that E[X - A] is not well-defined. The expected profit from switching is given by an infinite series whose positive and negative components are each of infinite total magnitude, thus yielding different sums when bracketed differently ("conditional convergence"; just a coincidence that the word "conditional" comes up here again).


This is a good comment and I believe your justification is equally valid.

The reason for the apparent disagreement is that in order to point out a logical flaw in an argument, one must make the argument formal and explicit enough first. There are several plausible ways that the Wikipedia argument can be translated into a machine-checkable formal proof (and this ambiguity is at the core of the paradox). When I make such an attempt in my head, I am not interpreting the expectation as a conditional one and the problematic step is about conflating a random variable with a fixed constant. In your attempt to formalize the Wikipedia argument, you try and use the iterated expectation theorem and the problematic step becomes different. Seeing from the comments, several people agree with my interpretation but clearly there is an ambiguity here that is part of the paradox.

Thanks for prompting me to add nuance to my comment.


The claim is that no solution is accepted as definitive, which is an empirically true fact about the academic literature. To respond to this with "no, my favored solution is definitive" is uninteresting.


I am in fact disputing this empirical fact, or at least the interpretation of it that is suggested in the article.

There is no such thing in mathematics such as an unresolved dispute over whether or not a five line proof with elementary concepts is flawed or not. The only possible situations are: 1) the proof can be checked to be correct at the level of axioms, 2) an incorrect reasoning step can be pointed to or 3) the proof is ambiguous and/or people cannot agree on what is being proved.

There isn't really a mathematical literature about the two envelopes paradox because the paradox is not really interesting from a mathematical standpoint. Or at least, making it interesting from a mathematical standpoint would require presenting a different argument than the one presented in the Wikipedia article. There may be a scholarly debate about different philosophical or linguistic aspects of this paradox. However, there is certainly no debate about where the flaw is in the presented switching argument once you make it precise enough.

(And the answer should not involve infinite series unless you are looking at a different, strong-arm version of the argument.)


The second simple answer from the article is basically this: pointing out that A is conditionally defined in step 1, but both definitions are used in later steps as if they are the same value. It then goes on to over-explain that a bit, but this answer is in there.


I agree with you. What I am taking issue with is the article presenting a multitude of (unnecessarily) complex refutations (as if one wasn't enough) and suggesting that there is no consensus on which one should be believed. This is not how people do mathematics!

A better article would first introduce a more precise version of the switching argument and then precisely identify the flaw in it. The more advanced mathematical and philosophical discussion that explores variations of the basic argument should be separated and contextualized clearly.


> Having chosen an envelope at will, but before inspecting it, you are given the chance to switch envelopes. Should you switch?

Normally we would assume there is someone who owns the envelopes of money. Now their incentive could be to offer to switch only if the envelope you chose contains more money (so they have a chance of keeping more). More like Monty Hall.

Even without that, after switching, should you switch back? The equation given argues you should switch back, so obviously the equation must be wrong.

It is easier to think: if I switch I have a 50% chance of making $X and a 50% chance of losing $X. Switching doesn’t change expected $.

I actually think the later problem in the Wikipedia article is more clever:

  Two people, equally rich, meet to compare the contents of their wallets. Each is ignorant of the contents of the two wallets. The game is as follows: whoever has the least money receives the contents of the wallet of the other (in the case where the amounts are equal, nothing happens). One of the two men can reason: "I have the amount A in my wallet. That's the maximum that I could lose. If I win (probability 0.5), the amount that I'll have in my possession at the end of the game will be more than 2A. Therefore the game is favourable to me." The other man can reason in exactly the same way. In fact, by symmetry, the game is fair. Where is the mistake in the reasoning of each man? — "Martin Gardner: Aha! Gotcha"


The latter problem is more interesting, although as written they're meeting to compare wallets with this game, so both wallets should logically be empty.

But assuming a more surprise situation in which the game is suggested* the reasoning is pretty good for both if they're happy with a 50% chance at doubling their money + X. But I think maybe its the hidden prize that makes the logic work (or not work)? For one of them the prize is less then double their money (a prize they'd never win but still) so the deal isnt a good deal. Not sure how the math works for that, but a contest where you might win 2a or less then 2a is presumably a different reasoning.

*Where similar to the first point we should assume the suggesters wallet is empty.


Of course, if ever invited to play such a game, what you really want to do is ditch the contents of your wallet on your way. Can't lose!


> Even without that, after switching, should you switch back? The equation given argues you should switch back, so obviously the equation must be wrong.

Ah, that observation is very clever! Even if you let yourself be bamboozled by the logic for switching, there is no reason then not to switch again by using the same logic. And again.. and again!


The error is that the probability of winning is not 0.5, it’s a function of how much they have in their wallet. They probably don’t know the true distribution so they can’t calculate the actual expected value. But clearly your chance of winning is better if you have $1 than if you have $100000


So add a step: they flip a coin and swap wallets if it comes up heads.

Now what is the probability of each winning? Did adding that step actually change the problem?

Or reword the problem just like I did in my grandparent comment: call the amount in the larger wallet $X. If you lose then you lose $X, and if you win then you win $X. 50% chance and expected outcome $X/2 by playing.


hmm, I don't really see where the probability of 50% comes from? even if we assume that we don't have any contextual knowledge about what amounts would typically be carried in wallets, the probability of winning would surely have to depend in some way on the amount currently in my wallet? in the extreme case, if I had zero cash, my probability of winning (not just drawing) would be zero, since we can't carry negative amounts of cash.


wait - if you had zero cash in your wallet, you'd _never lose_, because you couldn't receive less.


You'd never win, because you can't have more than the other guy. A draw is not winning


You win if you have less


My bad - then your probability of losing would be zero with zero cash, yes. But we still don't have a proof for the constant probability of 0.5 for winning.


Please don't use preformatted text for quoting, it's painful to read.


I'm going to present a critique similar to this one ( https://news.ycombinator.com/item?id=31568047) but I'll try to slightly rephrase.

Suppose the amount in the envelope you choose is $100. If your amount is the smaller one, the other one has $200. If your amount is the larger one, the other one has $50. So the expectation supporting the switching argument is computed over two different scenarios, where the amounts in the envelopes are {50, 100} and {100, 200} respectively.

This calculation seems to violate a hidden logical constraint: the set of amounts in the two envelopes must be the same no matter what conditional branches we consider for the expectation.

Another related issue - if you switch, the argument for switching back assumes the amount in your original envelope is 4x what you originally said it was.


To expand on this, the switching argument is actually correct if you happen to know that the envelopes contain either {x, 2x} or {2x, 4x} dollars (where x = 50 or whatever), and you observe that A=100. These are effectively hidden assumptions of the argument.

If you wanted to, you could explicitly model all possible values of the envelopes, and all possible observations of A – and you'd get the right answer again, including in the case above. You just can't define A in a path-dependent way, and then treat it as having the same value in all paths.


So the issue is assigning the value A to the selected envelope and then doing conditional calculations for only the other envelop.


Sounds like a misuse of relative return (growth rate). If you gain 100% today, then lose 50% tomorrow, you are back where you started. Absolute numbers give you the correct answer: x -> 2x is a gain of x, 2x -> x is a loss of x, so expected value is zero (x is a hard number not a ratio).


Exactly! The logic of "the average of +100% and -50% is positive" only applies if the denominator is the same. In this case it isn't.

The relative percentage "rule of thumb" doesn't even feel like a primitive in math. It seems wrong to reason from that.


why isn't this listed in wikipedia as an answer.


Because if the larger envelope is changed to contain four times as many dollars, then /2 vs. *4 points back towards swapping indefinitely.


If one envelope contains 4 times the money of the other envelope, then by swapping you either gain 3x or lose 3x, where x is the amount of money in the smaller envelope. Even in this case, the average return is +3x -3x = 0.


It is, it's the one where you treat both as having the same expected value


that the expected value is the same that is an outcome of all solutions, so just stating that is reductive.

What is at question is the reason for getting to the expected value is the same.


https://www.youtube.com/watch?v=_NGPncypY68

this video explains the paradox properly, with a very reasonable explanation.


I don’t think this explanation is very good. We could use a distribution where the expectation is defined and be back to square one. For example you can just tweak his distribution so that the ratio of envelopes over the ratio of successive probabilities is < 1 (he has 10:2 which is >1).


Yeah the Wikipedia explanation is terrible! Someone rewrite that.


that's an excellent treatment. I'm curious: this video picks a particular example of a distribution of the amounts and shows that the expected profit from switching is not defined. Can one somehow prove that for all possible distributions, either the expected profit from switching is 0 or it is undefined?


the point of the video is that given a distribution, the total expected profit cannot be defined, because the infinite sum of the probabilities of each case don't have an order, and also adds up to positive infinity and negative infinity.

it doesn't really matter what the distribution of the amounts are, as long as there's an infinite number of possibilities in the distribution (ie., it's not a finite amount of possible envelopes).


That’s not true for all distributions.

For example if instead of halving the probability of each value as he does in the video we take 1/20 then we get the expected value as the sum of 9/20^2 - 9/20^2 + 90/20^3 - 9/20^3 + … Here the series is absolutely convergent since the positive terms sum to 9/10 and the negative terms sum to 9/10.


> halving the probability of each value as he does in the video we take 1/20

but then the probabilities don't all add up to 1.


> as long as there's an infinite number of possibilities in the distribution

how would one prove that claim? it's not obvious to me. it seems like the specific conditions under which the series of terms in the total expectation has a well defined sum is important, but I didn't fully grasp those conditions


Some excellent critiques have been provided in this thread already but I would like to offer a different perspective.

In programming terms, the switching argument (see original link) is incorrect because it does not typecheck. And it does not typecheck because variable A is used out of its scope when writing down the (5/4)*A expected value.

Indeed, variable A is tied to a specific random outcome and so it simply does not make sense to refer to it in an expected value ranging over this same outcome.

Trying to formalize the argument in a proof assistant makes the mistake clear and obvious. However, the ambiguity of the English language makes it possible for such typing errors to sneak in undetected.


> it does not typecheck ... because variable A is used out of its scope when writing down the (5/4)*A expected value

But that's simply not true.

> variable A is tied to a specific random outcome and so it simply does not make sense to refer to it in an expected value ranging over this same outcome

No, it isn't. It is the value of the chosen envelope -- which is known. You can open the envelope and look inside.

> Trying to formalize the argument in a proof assistant makes the mistake clear and obvious.

Have you actually tried this?

Consider the following alternative scenario: I give you $20 and offer you the following wager: we will flip a fair coin. If it lands heads you lose $10, tails you win (an additional) $20. Would you take that wager? If so, how is that different from the envelope problem? (Note that the $20 grant is a no-op, all it does is serve to fix the amount of the payouts after the coin flip.)


There is a related puzzle which I've also heard called the Two Envelopes Problem. The premise is that you are handed two envelopes. Inside each envelope is a piece of paper with a number written on it. I have written down these numbers by sampling from some probability distribution, but you don't know what it is. All you know is that the two numbers are different. You get to pick one envelope and then look at the paper inside. You then need to guess whether or not the number in the other envelope is larger or smaller than the one you are looking at. Can you guess with better than 50% chance?


Is there actually some way to guess with better than 50% chance?

I'm pretty sure this could be reduced to the secretary problem where there's a pool of 2 candidates, which yields an optimal hire with probability 50%.

Maybe you play word games and say, "I guess the other number is not higher," or "I guess the other number is not lower." Since the number you observed was produced once, there is some non-zero probability that it was produced twice, in which case the other number is neither nigher nor lower.


The reduction is leaky in that it requires you to assume that there is nothing to learn from looking at the first number.

Here's a hint that is not a full solution. If you knew that the numbers in both envelopes were independently sampled from the same Gaussian (but not necessarily which Gaussian), then there is a simple strategy that wins more than 50% of the time: pick an envelope to peek at uniformly at random and guess that it is the larger one if and only if it shows a positive number.

Why does this work? If both numbers are positive, your strategy is equivalent to "pick a random envelope" and you're left with a uniformly random choice of envelope. If both numbers are negative, you're also left with a uniformly random choice of envelope. But if one is positive and the other is negative, you win 100% of the time.

What are the odds of the third case? Well every Gaussian has at least some mass on positive numbers and some mass on negative numbers. So there is some nonzero chance, even if you don't know what it is. So while the strategy might not always do better than a coin toss, some positive percent of the time you win every time. And so you get a distinct advantage over random guessing.

The problem is now how would one transfer such an approach to the original problem without the Gaussian assumption. Somehow the two problems are less different than they initially seem.


I didn't even think of the possibility of negative numbers. I can't ever imagine playing this x, 2x envelope game if there was any chance at all that the numbers would be negative. Like you open an envelope and it says I owe them either one million or two million dollars. Yeah, screw that game.


But couldn't you make the same argument for guessing that it's higher iff the number is greater than 42? Therefore, if the number is 9, both strategies tell you to do different things, yet in each case, your probability of winning is >½. There must be a hole in the logic somewhere…


> But couldn't you make the same argument for guessing that it's higher iff the number is greater than 42?

Yep.

> Therefore, if the number is 9, both strategies tell you to do different things, yet in each case, your probability of winning is >½. There must be a hole in the logic somewhere…

For any nonnegative value C, the probability of Gaussian draw x being larger than (iid) Gaussian draw y is strictly bigger than 1/2 when conditioned solely on x > C.

The particular choice of C might change which specific draws of x and y the strategy succeeds with, but as you noticed for every C a thresholding strategy of the above form does give you some pointwise nonzero advantage over random guessing.


Yes, there is a way to guess better than 50%. But it's a bit weird:

You pick an arbitrary threshold, if your envelope is below the threshold, you switch.

(You can add randomness as necessary.)


A very general solution, which works regardless of how the numbers in the envelope are picked:

Choose a monotonically increasing function f with values between 0 and 1. Choose one of the envelopes at random. Then look at the number x in the envelope. Then, with probability f(x) say that you have the larger envelope.


Yes. My suggestion is a special case of yours.

Both are equally baffling in how / why they work.


I think that's an ill-posed question without knowing (or making assumptions I guess) how the distribution was chosen isn't it?


I'm inclined to agree with you here. The solutions suggested here seem to depend on heuristics that depend on the type of distribution.

A tangential question: real numbers are uncountably infinite. Are probability distributions over the real numbers likewise uncountably infinite, or do they form a higher infinity?


First, a point of terminology: uncountably infinite refers to any infinite cardinality other than countably infinite, so if the number of probability distributions is at least the cardinality of the real numbers, then it's already uncountable.

As for what uncountable cardinality they form, it's the same as the real numbers [0]. Roughly, a probability distribution is determined by the countable collection of real numbers P(X <= q) with q rational. That means the cardinality is no more than R^Q, which is isomorphic to R. (the cardinality is at least R due to, say, uniform distributions on [0, x] for x real).

[0] https://math.stackexchange.com/questions/3698864/why-does-th...


If we take any finite interval [0,a], there's a unique uninformed prior of the uniform distribution, and we get a (a-b)/a probability of gaining by switching under that distribution. If we drag a to infinity, we get with probability 1 that switching is better. Numbers are big yo


So, my ONLY guess as to how there can be a strategy is that, if the number you see is positive, you keep it, and if the number you see is negative, you switch. My reasoning is this:

1. If the number you pick is positive, then there are technically (although arbitrarily small) more negative numbers that can be chosen for the other envelope. So you have a slightly larger than 50% chance of winning. 2. The inverse is true if you chose negative, however.

That said - I think this would generally imply you should always switch, which is why I'm not convinced I'm right. Consider a purely positive set, for example. If my number shows up as "12" - then I know there are infinitely more numbers larger than it, but only 11 numbers lower than it. But this starts to feel like the original two envelopes paradox.

--

tldr; I'm not sure of the answer, although I think if I saw the answer I would have an "Aha!" moment.


I have to admit I remain unconvinced. Has anyone ever run a real world experiment to verify the math?

For whatever reason I can't get past my intuitive feeling that this problem is just a simple 50/50 and these proofs are just fancy window dressing.


You remain unconvinced that switching is pointless? Unlike most paradoxes, this is one where the intuitive answer is the correct one. It's a paradox because of the flawed, but convincing, reasoning that is given in favor of switching.


Is it?

"No proposed solution is widely accepted as definitive."

I mean for me this is totally a "cannot see the forest because of all the trees" situation and the intuitive solution is the right one.

But I thought the same of the monty hall problem (back in school) and only was convinved, after I wrote a small program to simulate it, which confirmed it.

But unlike in the monty hall problem, there is no new information here, when you are given the opportunity to switch. So the discussion seems weird to me.


That means that no solution to the paradox is definitive, but also there is no argument.

Switching back and forth obviously doesn’t have any positive. What they mean by “is not definitive” is what explanation is the definitive one.


It's much like the Monte Hall problem, in that people often forget to include the portion about the revealed door ALWAYS being a loser. If it were a randomly opened door then the option to switch would actually just be 50/50. This can be shown by just running the simulation over thousands of iterations. The same is true here. It's not actually going to yield better outcomes to switch. Sometimes logicians just need to own up to the fact that empiricism and applied mathematics is where the rubber meets the road.


The difference between this and the Monte Hall problem is there's no new information revealed or pool reduction here.


I’m not sure how literally applying Bayes’ theorem and reducing the sample space accordingly is owning logicians but you do you.


I wrote a pretty detailed analysis about this problem here in case you're interested:

https://mindbowling.wordpress.com/2020/09/14/two-envelope-pa...

It all depends on how the envelopes are prepared.


I must admit I don't find the "compelling line of reasoning" all that compelling; to me it seems the "paradox" has more to do with the meaning of probability than with its calculation.

That is, the probability is 1/2. That's all that matters to the decision making. Calculating an "expected value" at all is completely useless, whether or not you do it "correctly".

Am I missing something?


Calculating an expected value is meaningful and regular and does affect decisions. If you were told that you could pay £1 to flip a coin and heads would pay out 50p and tails would pay out £2, then playing it you would expect to make £1.25 (£0.25 profit) per play, on average. Many people interpret the envelope problem as being the same situation.


> Many people interpret the envelope problem as being the same situation.

It just seems obvious to me that it's not the same situation. That is, I think the "paradox" is ultimately about the temptation to treat these as similar problems in the first place.


Right, the difference here is that the calculation that determines the expected payout is the same as the calculation that determines how much it costs to switch. With the coin, the cost to play is set and is lower than the expected outcome.


I was thinking more along the lines of there being no need to actually calculate anything at all because the only relevant information is the 50-50 probability of choosing the higher envelope. And since you gain no new information after choosing, there's no need to calculate anything afterward either. The "paradox" is in making one think there's something to be calculated beyond the 50-50 chance.


Let's play a game where I wager $10 on each coin toss and you're the casino. Heads, I pay you $5. Tails, you pay me my $10. Just like the envelope game, right? "The only relevant information is the 50-50 probability", right? So if I pay you $1 per toss, you'd be happy to play this game and take my money, right?


> Just like the envelope game, right?

No, because you're not actually wagering anything in the envelope game, so it makes no sense to calculate probabilities as though you are.

An analogous coin toss example would be something like: I'll flip a coin, you call heads or tails. If you call it right, I'll give you $10. If you're wrong, I'll give you $5. (So either way you make $5, so I don't know why anyone would even offer this game!) After I flip the coin, I'll conceal it in my hand and give you a chance to change your choice of heads or tails. Well, of course there's no reason to change your choice, it's completely arbitrary! You've got a 50-50 chance of winning $10 regardless. The paradox makes you think you're wagering your unknown potential winnings, but you're not actually wagering anything.

It has nothing to do with calculating whether or not a specific wager is worth it given some set of probabilities. That's a different problem for a different sort of game.


> No, because you're not actually wagering anything in the envelope game, so it makes no sense to calculate probabilities as though you are.

If we consider the decision point where you can either keep the first envelope or switch it, you are effectively wagering the value of the first envelope. You're just muddying the waters when you're trying to make some kind of point about receiving the first envelope for free. Yes yes, you are only gambling your "winnings" that you won earlier, it doesn't make a difference here.

> An analogous coin toss example would be something like: [...] I don't know why anyone would even offer this game!

Look, you made the claim that expected value doesn't matter for anything at all, the only thing that matters is the probability of winning. I offered you a game where you have a 50% probability of winning on each coin toss and I offered to pay you $1 for each coin toss to incentivize you to play it. Can you please explain why you are refusing to play this game? Yes, you don't like the analogy, I get that, but you also don't like free money? I find that hard to believe. A more plausible explanation is that you don't actually believe the claim you are making. That's why you aren't willing to wager any money on it.


> Yes yes, you are only gambling your "winnings" that you won earlier, it doesn't make a difference here.

It makes all the difference because you don't know what those "winnings" are, and what you're potentially wagering them for depends on what they are. So ultimately you're not really wagering anything. (Again, I think this is the crux of the "paradox".)

> Look, you made the claim that expected value doesn't matter for anything at all

I never meant to claim it doesn't matter for "anything at all", just not for this envelope game. (Apologies if the meaning of "at all" was ambiguous in my original post.)


> I never meant to claim [expected value] doesn't matter for "anything at all", just not for this envelope game.

When you say "this envelope game", I assume you are also including minor variations of this game?

If you mean specifically this exact envelope game, where the expected value for all actions is zero, then it is strictly true that it doesn't matter how we make decisions - using expected value or not - though it's not a particularly interesting point to make.

I will proceed assuming you mean also including minor variations of this game.

You wrote this earlier:

> That is, the probability is 1/2. That's all that matters to the decision making. Calculating an "expected value" at all is completely useless, whether or not you do it "correctly".

We can make a minor variation to the game such that the probability will still remain at 1/2, but the expected value for switching can be changed. We can make the minor variation such that it will be profitable to switch, or such that it will be unprofitable to switch. We can do this while the probability remains at 1/2. According to you the probability is all that matters, and it's useless to calculate the expected value for a game like this? This would lead to making unprofitable decisions in a game with a small variation as described above.


> When you say "this envelope game", I assume you are also including minor variations of this game?

No; it seems obvious to me that the argument for switching presented in the original wikipedia article is meant to apply only to the evelope game as it's presented. Of course introducing variations could easily change the meaningfulness of the article's premise. That is, it wouldn't be considered a "problem" or a "paradox" if calculating an "expected value" were actually meaningful.

> then it is strictly true that it doesn't matter how we make decisions - using expected value or not - though it's not a particularly interesting point to make

True, but my point was more of a question: if it's obvious that calculating an "expected value" is irrelevant in this specific case (as the article says, "It may seem obvious that there is no point in switching envelopes as the situation is symmetric"), why is the argument presented in the article considered compelling? That is, either it's not actually compelling, or the "expected value" being meaningless in this case is not necessarily so obvious at first... but if so, why not? (Or, to put it another way, why is the argument in the article compelling enough to warrant such a long wikipedia page with such numerous proposed "resolutions"?)


Your question would be very easy to answer if it concerned a whole basket of "games like this". Now that you narrowed it down to this specific game, it becomes difficult to answer, because we can just throw our hands up in the air and say "it doesn't matter what we do". If we allowed slight variations in the game format, it suddenly _would_ matter what we do, and suddenly we would need to do expected value calculations to do well in these games.

Your question is akin to a driver plowing through red lights without an accident and concluding that the traffic light was useless in this specific instance. "I didn't cause a crash even though I ran a red light, so the traffic light was meaningless! In this specific instance I mean! It didn't matter if I stopped to the red light or ran across it, no accident would have happened either way in these very specific circumstances, so why are people even talking about traffic lights?"


> That is, the probability is 1/2. That's all that matters to the decision making. Calculating an "expected value" at all is completely useless, whether or not you do it "correctly".

I'd be happy to take you up on that offer. Let's play some dice or card games that are set up such that you'll always have 9/10 probability of winning. You'll play, regardless of the expected value, right?


This reminds me of a fascinating problem (not sure if there's a name for it):

You play the following game (say for many rounds). Someone puts different values in two envelopes (they can pick the values arbitrarily, even maliciously, each round). You want to pick the higher valued envelope. You pick an envelope, they show you the value of the other. You can hold your value or you can switch.

It also works if the game is they show you your own value and you can choose to hold or switch. Actually, now that I think about it, you can even make the game such that you pick an envelope and tell them which envelope to show you, then you can hold or switch. More interesting :)

The questions is: is there a strategy where you get the higher value strictly more than 50% of the time?

The surprising answer is, yes, there is such a strategy. It's really neat, and I've tried to find other places to apply the reasoning, but have not found another interesting place.

And yes, this problem and strategy are mathematically correct, not cheating, not "out of the box" stuff.

Good luck :)


Generate a random number T from the arctan distribution. If the number in the other envelope is greater than T, switch.

For envelopes with two different numbers n, m, the above is guaranteed to give you the higher one strictly more than 50% of the time.


Yep - any distribution that is nonzero over the set of choices works, so over the reals, anything that is everywhere nonzero works.

For example, exp^{-x^2} with normalization factor is a simple choice.


Is there a complete explanation for this somewhere?

Asking for a friend, I obviously fully comprehend what you're talking about here.


Yeah, the solution is somewhat tricky. Here is the idea (this is somewhat sloppy, but gives the idea - if you're mathematically advanced all this can be made precise):

Suppose you have a way to pick a random number yourself, that has nonzero chance to pick any number. Suppose the person putting things in envelopes picks X < Y as their numbers. You too pick a random number, say T. Open an envelope. If that envelope is less than T, then switch. You will get the larger number more than half the time.

Here is why it works. You pick X or Y envelope with 50/50 odds. 3 cases:

1) Suppose you picked a T < X. If you opened X, then T < X, you don't switch, so you lose. If you opened Y, T < Y, you don't switch, you win. So if T < X you win half the time.

2) Suppose you picked a T with Y < T. Then no matter what you pick, T is not < your envelope, you will not switch, and again you win half the time. Nothing gained so far.

But, if you happened to pick X < T < Y, which happens with some nonzero chance, then analyze: if you open X, X < T, switch, you win. If you open Y, T < Y, you don't switch, you win. This case gives you more than 50/50 odds.

The technical details is it's minorly tricky to pick a real number out of an infinite set at random, with any number being a possible outcome. But the probability distributions above do the trick - basically any function you can draw that is everywhere positive, with the tails squeezing to zero fast enough (but never reaching it) will work.


To expand on this, you can use any distribution that has nonzero density over every interval.


FWIW, I think all 3 versions of your game are completely equivalent. In all 3 versions you first get to decide which envelope you look inside, and then you get to decide which envelope you open.


Yes, I agree they are equivalent, which is why I was able to see the same solution allows all three games. The last one gives the puzzle solver more things to fiddle with, even though it does not matter, perhaps making the puzzle more difficult or interesting.


The mistake is in step 7.

You cannot calculate the expected value (of the envelope you get, when you pick one randomly and then swap) like that, as A is not a constant and actually depends on if you took the less or more valuable envelope initially.

If you use A1=X in case you took the less valuable envelope initially and A2=2X in the other case, the expected value is (1/2)*(2*A1) + (1/2)*(A2/2) = 1.5X

and if you don't swap, you get: (1/2)*A1 + (1/2)*A2 = 1.5X

... which makes sense.

If you change the experiment so that the value of the second envelope does not depend on the first one (if you first chose and only then the value of the other envelope is randomly either double or half), A is constant and you actually should swap.


My solution is this:

In point 2. the solution starts talking about probabilities without defining the sample space.

And this is why the solution is nonsensical. "One envelope contains twice as much as the other" does not define a sample space, so it is too early to start talking about probabilities.

If you say "the sample space is all pairs of natural numbers where one is twice as big as the other", then what are the probabilities assigned to each pair? They cannot be pairwise equal because the space is infinite.

So even if probabilities for (X, 2*X) and (X, X/2) happen to be equal for a particular X, they definitely cannot be equal for every X, so the "infinite swap" doesn't hold anymore.


The paradox is resolved by realising you are doubling a smaller amount than you are halving.


This reminds me of the Monty Hall Problem:

https://en.wikipedia.org/wiki/Monty_Hall_problem

(except that problem has a more definite solution)


The difference is of course that you do get information in the Monty Hall setup, and here you get none, if I read that correctly.


In fact, the fake/paradoxical solution probably plays on the reader's familiarity with the Monty Hall problem. ("Ah yeah I remember this. Switching is better!")


Exactly what I thought too. I find human behavior in such situations fascination. Have penned some thoughts on it - have a read if you like

https://ycmusings.com/why-wont-we-switch/


Clearly the solution here is three envelopes.

http://wikibon.org/wiki/v/Prepare_three_envelopes


And this is why we try to hammer into students' head that you can only manipulate series after you've established convergence / that the quantity actually exists in general.


The game looks like this in tree form:

             -> Keep    -> 2A
          2A -> Switch  -> A
    0.5 ->
          A  -> Keep    -> A
             -> Switch  -> 2A

Switch and keep both have an expectation of 3A/2 because (1/2)2A + (1/2)A. In the equation in the article they use (1/2)2A + (1/2)*A/2. They count A/2 as half of what the actual outcome should be. The mistake was abusing the A to have different meanings.

They created two trees one with 2A.

             -> Keep    -> 2A
          2A -> Switch  -> A
    0.5 ->
          A  -> Keep    -> A
             -> Switch  -> 2A
And another with 1/2A

                -> Keep    -> A
          A     -> Switch  -> 1/2A
    0.5 ->
          1/2A  -> Keep    -> 1/2A
                -> Switch  -> A

Than they treated the chance node as selecting from these two different trees at the same time with 1/2 probability. Which is nonsensical. You can't be in an entirely different game with 1/2 probability - it isn't the same game so for analysis purposes it has 0 probability of being reached.


So why is this happening?

Some people are pointing at the error I mentioned above, but I think the problem is actually much deeper than that. Identifying this error is basically identifying the superficial reason for why they went wrong. It doesn't dig deep enough to systematically eliminate the error.

I want to point out something. Notice their logic when they call out why it is a paradox. You can switch multiple times and it would never terminate.

Mathematical formalism like markov decision processes and the bellman equations actually have the ability to capture this type of reasoning. They can even deal with infinite loops as happens here by including multiplication by a horizon term which makes the limit of the recursion equal to zero.

So the root problem is that, actually, the entire problem wasn't even using the right theoretical grounding.

To get really explicit on how it was not using the right grounding I want to point out the key error again, but stated a bit differently.

The author neglects information sets. He creates two game trees, discriminating on the basis of knowledge of which subgame he is in. This is a huge red flag. You can do that in perfect information games, but you can't in imperfect information games! You aren't playing over states in imperfect information games. You're playing over information sets.

You end up getting more insights into how this is wrong when you think about the problem in this way. For example, there might not be an error in this particular problem, but the choice to not express the action space as a probability vector is going to bite you in more complicated imperfect information decision problems.


On reflection, it bites you on this particular problem too, because when you take the limit of P(Switch=1) the recursive switch case with a discount factor under the formalism mentioned above is zero.

So for every strategy selection except the one the article chooses, the EV is what I mentioned before. But for the last it is zero.

Its the same problem: they aren't walking the graph of the game tree. But its a bit different to my simplified version of the game that shows where there EV calculations went super wrong, because the actual game tree is of infinite length and there is /never/ a node with SWITCH that terminates in a reward, because every SWITCH recurses to another decision point.

You can solve it using various formalism by adding a discount factor. That lets the limit go to zero for the recursive cases. I guess technically its undefined since the game never ends, but I don't consider being stuck in a loop to be rational [1]. You really have to let your action probabilities vary as you solve imperfect information problems. It happens in MDP and MCCFR for a reason - you don't get the correct action without for the way your policy choice affects the outcome distribution. Your expectation is dependent on your policy it doesn't exist in a void.

That last statement, well, it should be very obvious in more complex situations. That it isn't obvious in this situation speaks to the problems deceptive simplicity.

[1]: Unlike people who think humans are irrational for having cognitive biases.


change your mind as many times as you like, there will only ever be one actual choice


It's a simple as that. If you know you pick A and then you will switch to B then it's the same as if you had just decided to pick B in the first place.


The bug is obvious when they say "one envelope contains 2*A and the other A/2".

Well no, one doesn't contain one fourth of the other.


That's not what it's saying. It's saying one envelope contains A and the other contains either 2*A or A/2.


Yes, but doesn't the formula treat it a such?


yes but the formula at bullet point 7 says it. That's the point. The formula is wrong and doesn't correspond to reality


Basically the sleight of hand trick is using a random variable A as though it was a constant.

Yes, B = A/2 or B = 2A with probability .5 each, but not for the same value of A.


Why not?


A is the starting value, in the one case it's half as much as in the other case.


If you treat this as a Bayesian problem, your context and experiences could lead you to specify a prior which has a finite mean, in which case for some values you will have a posterior probability of > 2/3 of it being the larger value.

E.g. if the maximum volume of currency that could fit in an envelope is $10k, it would be possible to have a uniform prior between 0-10k, at which point I would stick if my envelope had $5.01k.

A more extreme prior could be one capped at the largest ever academic research grant (I bet it's not more than $100m).

It could also be that you have a prior that has infinite mean, in which case it is not surprising that you will always want to switch if you draw a specific value.

If you have risk aversion, this further increases the range that you would like to stick, since switching could decrease your expected utility, even if it increases your expected return. There is a separate paradox in decision theory that we should be approximately risk-neural for 'small' amounts.


> The puzzle is to find the flaw in the very compelling line of reasoning above. This includes determining exactly why and under what conditions that step is not correct, in order to be sure not to make this mistake in a more complicated situation where the misstep may not be so obvious. In short, the problem is to solve the paradox. Thus, in particular, the puzzle is not solved by the very simple task of finding another way to calculate the probabilities that does not lead to a contradiction.

There are two flaws. One is the omission of counterfactual reasoning. This produces the logical contradiction 2=1 at step 7, but you get there a bit earlier because the steps before it are where you do case based reasoning. The scenario where this happens is when you are in imperfect information contexts. You can reason about subgames in perfect information games without the subgames influencing each other. You can't reason about them in imperfect information games without the subgames influencing each other.

The lessons is this - you have to consider counterfactuals. To give a concrete example of someone doing this, think Jeff Bezos when he talks about how he decided to found Amazon. He says he considered the situation in which he founded it and failed and the situation where he founded it and succeeded and the situation where he didn't found it. He didn't get his expectation conditioned on one state, but over an information set that contained many counterfactual outcomes.

The second flaw is much deeper and is what leads to the paradox. The failure is that the entire framing fails to recognize that EV is a function of policy. You can see this by setting the policy to always switch, as they do, and noting that the actual EV for switch is now undefined. Therefore in claiming knowledge of EV without having policy as a higher order function of EV you get a contradiction. 3/2A=0 and/or 3/2A=undefined.

The lesson in this is that policy function influences EV. This should hopefully be obvious to most people in more complicated situations. For example, slamming your head into your table has a lower EV than enjoying a drink of water. You have to account for this whenever your policy choice isn't defined.


Take 10 minutes to model it and you'll find there is no benefit in changing. As a couple others have pointed out, it isn't the Monte Hall problem.

Below I did two runs with a million cases. One envelope has 1 unit; the other has 2 units. I shuffle the envelopes, and the "player" chooses an envelope. "noChange" means this is the payoff if the player doesn't switch envelopes. "yaChange" means this is the payoff if the player switches envelopes.

$ python3 ./twoEnvelopes_00.py 1000000 noChange = 1500023 yaChange = 1499977

$ python3 ./twoEnvelopes_00.py 1000000 noChange = 1499984 yaChange = 1500016

When modeling Monte Hall, the payoff for changing is obvious.


Related:

The Two Envelopes Problem: the most boring paradox in the world - https://news.ycombinator.com/item?id=6388645 - Sept 2013 (12 comments)

Two envelopes problem - https://news.ycombinator.com/item?id=6387044 - Sept 2013 (88 comments)

Two Envelopes Problem - https://news.ycombinator.com/item?id=1582219 - Aug 2010 (88 comments)


Also see the Surprise Execution Paradox. It's a favourite of mine.

[edit] Now submitted: https://news.ycombinator.com/item?id=31568880


Back when I made videos for YouTube, I did a little video on this paradox. I used a pair of cash boxes instead of envelopes because I thought opening boxes looked better on screen than opening an envelope. I know I didn't explain the issue very well because a lot of commenters thought I was discussing the TV game show "Deal Or No Deal".

https://www.youtube.com/watch?v=0yjVv_iB1h0 (Interestingly, a lot of those DoND comments have gone. Did they delete them?)


My take :

let M be the maximum amount of money that a envelope can contain (at worst it can be the total amount of money in the world). You don't know the value of this upper bound M.

Most of the time, you get 2*x or x/2 with 50% chance if you choose the other envelope (which on average is a win)

But when x=M you get M/2 instead of M and therefore you loose M/2 with 100% chance. And when it happens you have lost much more than in the other cases (because exponentiel...)

If you do the math, you can see that what you win for 1 2 4 8 16 ... M/2 is equal to what you loose for M.

And so on average what win from switching is 0.


One mental model I apply to this and other similar problems is tracking what 'universe' you're in. In the flawed solution, the problem is that there's a universe in which the amounts are A and 2A and a universe in which the amounts are A and A/2, but they aren't the same universe, and the two possibilities will have different values for A and so you can't easily average A/2 from one universe with 2A from the other.

(The usual term is sample space; it's just more fun to split and eliminate universes than states.)


Reminds me of the "splitting the dinner bill" problem we had fun with in middle school. It goes like this.

Three friends go out for dinner and decide they will split the bill threeways. The bill comes to $25. Each friend gives a $10 note with $30 pooled together to pay the bill. The waitress returns with five $1 bills in change. They leave a $3 tip for the waitress with $2 in change remaining.

The question is: Of the original $30 paid if we deduct $3 in tips paid, we get $27. Adding the $2 in change remaining we get $29. Where is the other $1?


I wrote up a pretty detailed analysis of this problem a while back here: https://mindbowling.wordpress.com/2020/09/14/two-envelope-pa...

At the end I constructed a super weird example where regardless of what you see in your envelope, you should switch (this seems to go against the symmetry argument, but is sort of saved by a divergence in the expectation value).


Why is no one looking at the chance-giver, and considering his motive? One would reasonably assume he'd like to minimize the payout -- so he'd offer you the chance to switch only if you had chosen the richer envelope. The description explicitly allows this possibility:

"Having chosen... you are given the chance to switch"

Nowhere does it say, you'll be offered the switch no matter which envelope you choose.

Same idea applies, to the very similar Monty Hall problem.


Just keep swapping the envelopes. Infinite money hack!


And what is the solution? I would just take the money at point A, i.e. at the very beginning, you never know when the experiment supervisor decides that he has enough of potentially giving even more free money (in case I'm smarter compared to him/her). Plus, free money at moment t is better than potentially double free money at moment t+t1


At some before point memorise enough random generated entropy so that when you enter this situation you combine that with the Schelling point (the left hand envelope for example) to pick one at random.

Then whatever trick is being played you still have a 50-50 chance.


It is very simple.

Let's represent the poorer envelope by the value 0.5 and the more bountiful one by 1.0.

The expected value from the envelope draw is 0.75: 0.5 drawn with a 50% probability contributing 0.25 to the expected value, and 1.0 drawn with a 50% probability contributing another 0.5 to the expected value for a total of 0.75.

If you were to do a large number N of these draws, the value you will obtain will be from a distribution that is centered on 0.75 N. So far so good?

So the way to look at a draw is this: you're going to be getting the expected/average value of 0.75, together with either a 0.25 bonus if you pick the better envelope, or a else -0.25 penalty if you pick the poorer envelope.

So you see, those two situations are equal and opposite. By switching, you either turn a 0.25 bonus into a -0.25 penalty (- 0.5) or vice versa (+ 0.5) with equal probability.

If you need any more complex analysis than this, you need to be thwacked on the head.

The rhetoric about doubling versus halving the money is a distracting red herring. If you switch from the 1.0 envelope to the 0.5 envelope, you're down 0.5, and if you switch the other way, you're up 0.5. That's it.

Accounting is based on debits credits not halving and doubling. You would never update a ledger by crediting double some amount to one account, and debiting half the amount from another account. It's all purely additive.

You have to look at what you're potentially gaining or losing on its own not as a fraction of some guaranteed fixed portion that you're getting from either envelope.


>The puzzle is to find the flaw in the very compelling line of reasoning above. This includes determining exactly why and under what conditions that step is not correct, in order to be sure not to make this mistake in a more complicated situation where the misstep may not be so obvious. In short, the problem is to solve the paradox. Thus, in particular, the puzzle is not solved by the very simple task of finding another way to calculate the probabilities that does not lead to a contradiction.


I did solve the paradox: it emanates from falsely looking at the switch as doubling or halving the money.

If the envelopes contain, say, $50 and $100, it looks like a doubling/halving. But nothing materially changes if we add $1000 to each one to make $1050 and $1100. The probabilities don't depend on what is in the envelopes, and the win/loss is just a spread of the difference between their values.

Why would anyone, while solving some problem, worry about some unspecified more complicated situation which supposedly resembles that problem?

Give me the specific of that more complicated problem and I will knock it out of the ballpark using whatever reasoning fits that problem.


You are missing the point. The problem of whether you should actually switch is obvious and trivial (you should not). The challenge here is explaining why the plausible 'proof' that you should switch is wrong.


I read that in the Wikipedia page. There is some obvious argument involving repeatedly switching sides and whatnot, and it has some subtle logical flaw. Why bother with that at all.

OK, but let's look at it:

1. Denote by A the amount in the player's selected envelope.

2. The probability that A is the smaller amount is 1/2, and that it is the larger amount is also 1/2.

3. The other envelope may contain either 2A or A/2.

OK, here is where this argument got seduced by the multiplicative thinking. The correct reasoning is: let's call the smaller base amount A and the larger one 2A. One envelope contains A, the other one 2A. That's it.

3. If A is the smaller amount, then the other envelope contains 2A.

4. If A is the larger amount, then the other envelope contains A/2.

Here, the argument is splitting into cases because of the IF, and in these two cases, A has a different meaning.

5. Thus the other envelope contains 2A with probability 1/2 and A/2 with probability 1/2.

6. So the expected value of the money in the other envelope is:

This is wonky. The overall two-envelope situation has an expected value, not the individual envelope.

The correct view is that if this is the richer envelope, it contains X more money. If it is the less endowed one, then it contains X less.

    1        1           5
    - 2A +   - (A/2) =   - A
    2        2           4 

7. This is greater than A so, on average, the person reasons that they stand to gain by swapping.

That calculation is wrong though, by itself. That issue should be attacked directly.

Separate calculations have to be performed for the two cases: A is the smaller amount and A is the bigger amount. For each of these cases, the expected value from both envelopes has to be calculated.

The way the above formula is doing the expected value calculation is off because it's equivocating on A.

It ends up working with two unrelated quantities: 2A and A/2. 2A is four times bigger than A/2. A is simultaneously denoting the smaller amount and the larger one.

8. After the switch, denote that content by B and reason in exactly the same manner as above.

Yes, and if you do, you will end up with 5/4B. So that can't be right and the whole remaining argument about infinite switching being irrational compared to just opening an envelope is superfluous.


Because the variable A refers to an amount, has to have a consistent meaning across the two cases, and so it has to be:

1. We have A; the other envelope has 2A; or else

2. We have 2A; and the other envelope has A.

In case 1, if we switch we gain A.

In case 2, if we switch, we lose A.

We have no idea which case we are in.

We cannot use the variable A to denote our initially selected envelope in both cases, because then A simultaneously represents the smaller amount and the larger amount, and we are absurdly mixing the quantities A/2 and 2A in the same calculation. No quantities in the actual problem are related by a factor of four!

People might be seduced into the A equivocation perhaps because they are initially thinking of A as being a label for the envelope, which is arbitrary and innocent enough. We have some envelope A and it has some money in it. Oh, but the other envelope then has either A/2 or 2A. At that point, A is no longer an envelope name, but a name for an amount. Then it has to pin down a specific number: it has to denote either the smaller value, or the larger one, and that value is found in different envelopes in the different cases; A cannot both be "name of initially chosen envelope" and "amount in that envelope".


I'm not sure it's that simple.

let's imagine the first envelope contained between 1 andand 1000 dollars, the second double the first.

if you open your envelope and it contains a whole number of dollars between 1 and 1000, then you should swap, id take "250 or 750" over 500.


But the problem is explicitly about swapping before opening any envelope.


Only if each value is equally likely. If you see $1,000 but figure the envelope-filler is a lot more likely to have been willing to put $1,500 in than $3,000 then you should stick.


Notably different from the Monty Haul problem where additional information is given to the player of the game prior to making the decision to switch.

See https://en.wikipedia.org/wiki/Monty_Hall_problem


The “Monty Haul” problem is where a dungeon master in D&D awards too much treasure to the party :-)

https://dungeonsdragons.fandom.com/wiki/Monty_Haul

https://tvtropes.org/pmwiki/pmwiki.php/Main/MontyHaul


Yikes! I see what you did there.


A fascinating line that shocked me, tucked away at the end of a paragraph with no further additional context aside from its source:

> Pigeons repeatedly exposed to the problem show that they rapidly learn to always switch, unlike humans.


Personally I don't think you are given any new information in the Monty Hall problem that's worth anything. You always knew there was a 100% chance that one of the doors in that set has a goat behind it, which door that is is pretty much irrelevant.

Or to put it another way how many people would stick with their chosen door if instead of opening a door and showing a goat they were simply asked "would you like to stick with your current choice of door, or switch and take the best result from the other doors?"


> switch and take the best result from the other doors

That is, effectively, what the choice boils down to. But that being said, since the problem is stated as only getting to pick one "other" door, the "extra information" received is knowing which of the other 2 doors has the best result.


I didn’t question the math, but what made me intuitively believe the Monty Hall problem was this: Me: I am thinking of a number between 1 and a million. If you guess correctly, you get a prize.

you make your guess Me: Ok, I will reveal that the number is either $YOUR_GUESS or 452,871. You may keep your original guess or switch to 452,871.

Of course, you would switch, because the chances you guessed correctly originally were 1 in a million.


Wow. I don't think it's ever been explained to me this well before.


Try thinking of it this way: The information is in which door the host does not open.

2/3rds of the time, your initial choice will be a goat. In those cases the host deliberately avoids opening the remaining door with a car - thereby telling you where it is.

If you chose the car door initially, then indeed you get no new information. But that only happens 1/3rd of the time.

I am curious if that changes your opinion.


Read their second paragraph. That is exactly how they think of it, and they do think it makes it obvious you should switch.


Ah! Apologies, I did not read that properly.


When Monty reveals a goat, your odds increase from 1/3 to 1/2 if Monty is picking randomly and just happened not to reveal a goat (and if he revealed a car we start from the top, or you automatically lose, or you automatically win, or...), but to 2/3 if Monty will reliably reveal a goat. I think it's hard to say that you have no extra information in that latter case.

It's true that we can add other options (pick both doors) of equal likelihood, but it's because of our increased information that the reduced option of picking just one door isn't actually worse off.


The GP knows all this, and agrees it's better to switch (look at the second paragraph). They just dislike the common formulation (that is given in terms of the extra information from the boot being opened), and find another formulation more compelling (your original choice is one door, while the switch is actually to the best prize from the other two doors, not simply to another one door).


The Monty Hall problem is easily resolved by imagining there are a million doors instead of just three. The host opens 9,999,998 doors, revealing goats behind each. You'd obviously switch doors to the one the host left closed, and the reasoning you followed will be just as valid for any number of doors including 3.

This problem is completely different, since the number of degrees of freedom remains fixed throughout the decision process. As someone else pointed out, it is trivially resolvable by realizing that no matter how many times you switch envelopes, you're really only making one choice, and doing so with 50-50 odds of picking the more profitable envelope. The offer to switch amounts to meaningless magician's patter, whether the offer is made once or a million times.


I always think of the Monty Hall problem with 100 doors. And with the game show host revealing 98 'goat' doors thus leaving two doors left (the original guess, and another one). You'd switch then right? :)


Your first paragraph makes it seem like you don't accept the common solution to the Monty Hall problem, which is why people have responded explaining it to you, even though the second paragraph makes it clear that you do, and just dislike the way it is explained.

Personally, I'm not sure that it would be easier to convince someone that the second formulation is equivalent to the original problem, but next time I find a dis-believer I will give it a shot!


When I was initially exposed to Monty Hall, what confused me the most is that I didn't know that the host would always open a door with a goat, rather than just picking one of the remaining doors at random. In the latter case (i.e. the case where the host can pick a car), even provided that you are in the universe where he picked a goat, switching and staying both yield a 50% chance of success.


It becomes a lot clearer if you increase the number of goats.

I have a deck of a billion cards, one entitles you to the car, the others have a picture of a goat. You pick one at random but do not get to see it.

Of the remaining 999,999,999 cards, I discard 999,999,998 goats.

Do you want your initial card, or the remaining card in my deck?

Or to frame it another way, do you want the car iff you picked it first or iff you did not pick it first?


The GP is not contesting this, they just think that the whole discarding/revealing of the goats is an unnecessarily complicated explanation for why this works.

The way they put it, the strategy becomes obvious if you instead reformulate the problem like so: there are three doors with three different prizes. You get to pick one door, but are then given a choice: stick with your original choice, or choose to get the best prize from the other two doors.

The solution (switch!) is more obvious in the second formulation. I have no idea if it's easier to convince someone else of the reasoning in the first formulation, or to convince them that the second formulation is equivalent to the original.


Once you pick an envelope, you no longer stand to only gain money.

You can lose money and that has to be reflected in the potential value of each envelope.

After the first selection, you must express the envelope value as the potential of what each envelope holds (the probabilities from the initial selection) which makes selecting again a wash.

Let A = 50 Envelope 1 is 100 Envelope 2 is 25

First selection 1/2(100) + 1/2(25) = 62.5

Great! Do it! 62.5 is bigger than 0, which is the expected value of not playing at all.

After that, it makes no difference to switch (the envelope value is recursive):

1/2(62.5) + 1/2(62.5) = 62.5

or

1/2(1/2(2A) + 1/2(A/2)) + 1/2(1/2(2A) + 1/2(A/2)) = 1/2(2A) + 1/2(A/2) = 5/4A

and the trick is that’s now the potential expected value you have (you now have 5/4A in your envelope), so switching is a wash. You never really “had” A as a value to compare against (by evaluating that 5/4A is bigger than A), we just get tripped up with the doubling and halving at the outset.

Said another way, the fact that 5/4A is bigger than A is irrelevant, no envelope contains A, they both contain the expected value of 5/4A.


> Let A = 50 Envelope 1 is 100 Envelope 2 is 25

No. The problem specifies that E1 is twice E2, but here you have E1 = 4 x E2.

A is the amount in one of the envelopes, so if A=50 then either E1 is 50 or E2 is 50, and the other E is 25 or 100. But under no circumstances can E1=100 and E2=25 at the same time.


Impossible. Schrodinger’s envelope, then. You can’t reflect the probability to be 2A AND 1/2A if A represents the value of an envelope.

The value of the other envelope must include the probability that it is the original A (not some sleight of hand new A’ that, itself, is based on an expected value).


> You can’t reflect the probability to be 2A AND 1/2A if A represents the value of an envelope.

Yes, that's right. That's exactly what I said: "A is the amount in one of the envelopes, so if A=50 then either E1 is 50 or E2 is 50, and the other E is 25 or 100. But under no circumstances can E1=100 and E2=25 at the same time."


Excellent answer. Worth adding to the wiki page.


A related problem, but actually interesting, because there is a beneficial strategy, is the Monty Hall Problem.

https://en.wikipedia.org/wiki/Monty_Hall_problem


Also, it is a good opportunity to remind that the Monty Hall Problem was mentioned in a Brooklyn 99 episode, in case someone missed on Brooklyn 99.

https://m.imdb.com/title/tt6026160/movieconnections


The Monty Hall problem has a simple resolution. At the beginning you had a 1/3 chance of picking the good door, and 2/3 chance of picking a bad door, meaning you probably picked a bad door at the start, so you should switch.


You can say 'simple' but a whole pantheon of academics famously got it wrong. If you really want to understand the problem use 100 doors instead of three and then it becomes very clear whats going on.


I'm also an academic, if that matters for some reason, and I think it doesn't matter how many people didn't understand something. They must have been confusing themselves by seeking answers in the wrong direction. What matters is that the resolution I've given truly is simple. You have a 1/3 chance of being in a situation where changing doors is unfavorable, you have a 2/3 chance of being in a situation where changing doors is favorable. If you picked the door with the car, which happens with 1/3 probability, at the next stage switching is a bad option. If you picked a door with the goat, which happens with 2/3 probability, at the next stage switching is a good option. This is objectively simple, regardless of what a large number of people might think.

I also don't think changing the problem to an analogous one is as helpful as this direct and simple solution. Using 100 doors could mean multiple things. Maybe it means at the next stage you have 99 doors left. Maybe it means at the next stage you have 2 doors left. Why should I think either is the correct analog? You can just see the simple answer I've stated and be done with it.


Or a million scratch-off tickets.


Sidestep the evil statistician/probability punter's conundrum, and walk off with your found money. Have lunch with the Free Real Estate guy.


Comparing this to the Monty Hall problem:

Probability of choosing the correct door at time of choice:

Two Envelopes: 1/2 -> 1/2

Monty Hall: 1/3 -> 1/2

...because Monty opens one of the doors, providing information.


Monty opening the door condenses the probabilities of the two unpicked doors in to the one unpicked door. This occurs because Monty only opens a door with a goat and never the car. In the original three door version you always had a 2/3 chance of winning if you switched because you'd only lose if you correctly picked the car on the original 1/3 pick.


If it is truly the case that the second envelope has equal probability of having half or double the amount of money as the first, then you can prove that the probability distribution of money in each envelope is not well-formed (has total mass either zero or infinity). This means the premise is already contradictory, and further false conclusions (such as "P and not-P" for P="it is better to switch") are not surprising.


> If it is truly the case that the second envelope has equal probability of having half or double the amount of money as the first, then you can prove that the probability distribution of money in each envelope is not well-formed (has total mass either zero or infinity). This means the premise is already contradictory...

What? I can literally take 2 envelopes right now - physical envelopes - and I can literally put $10 in one envelope and $20 in the other envelope. Then I can shuffle the envelopes and hand one of them to you. Now the other envelope has equal probability of having half or double the amount of money as the first. How is this premise contradictory in your opinion? I can physically arrange objects in this fashion, so where's the contradiction?


If I open an envelope with $10 then I know that there is $20 in the other envelope.

As soon as you assign a well-formed probability distribution to the money in the envelopes you will find that opening the first envelope is informative.


Opening the envelope is something that you added to the problem just now. It wasn't present in the original problem, and it wasn't present in your earlier post.

Here's a quote from Wikipedia: "Having chosen an envelope at will, but before inspecting it, you are given the chance to switch envelopes. Should you switch?" Note the part that says "before inspecting it". That means you can't open it.


Ah this is a different version of the problem. There are quite a few listed on the wikipedia page.

I believe the version you are talking about is just a simple equivocation.


> Ah this is a different version of the problem. There are quite a few listed on the wikipedia page.

The vast majority of the Wikipedia article concerns variants that do not involve opening the first envelope.

The first definition of the problem states "before inspecting it".

The first section, "Introduction", also describes the problem similarly: "before they open it". This section - which defines the problem - does not introduce any variants which involve opening the first envelope.

The next section, "Simple resolution", describes a resolution to the variant described in the introduction (where the first envelope is not opened).

The next section, "Other simple resolutions", also discusses the same variant (where the first envelope is not opened). At the end of this section there is a remark "We could actually open our envelope before deciding on switching or not and the above formula would still give us the correct expected return", but that is the extent to which the first-envelope-is-opened variant is discussed there.

I could go on. My point is that the discussion very clearly revolves around the variant where the first envelope is not opened, and you have to go down a pretty deep rabbit hole in order to find a variant where the first envelope is opened.

> I believe the version you are talking about is just a simple equivocation.

Yes, just a simple equivocation, with a Wikipedia page that could be printed as a book. Does anything strike you as odd about that?


Sounds like the "free will" discussion - go have fun while other people solve real problems.


Whatever formulation that compels the player to swap is flawed.


Just write a python script simulating this :)


Intuitively, and ignoring the constraint of the values being (A|2A) if I just assign a random number value to each envelope, does the order in which I have assigned the random numbers change the amount of information I have? I suspect not.

However, I now have two random integers with no bits of information about either, where I have now removed information from the system (the relationship between the number of envelopes and the contents of one being twice the other), while leaving the states of the system intact.

The envelope I in effect already have yields no bits of information because I didn't choose it, it is just the random (no information) one I start with, whereas the fact of the existence of the second envelope contains more information than my current envelope selection, and I have a decision to make about whether to switch.

The null state of the system is me holding an envelope with no information, and addition of a second envelope is a new state of possibilities, where the information in this two envelope system is now non-zero.

The decision with the most information is to switch. This additional information is independent of the implied desire for (A|2A), whereas the original problem is defined as the best possible decision is the one with the most information.

Further, if I perform some operation with those two random integers (Za and Zb), the result has less "entropy" than the two random inputs in isolation, as by correlating them via a function I have produced some non-zero negentropy between them, say using xor, which is how I added information back into the system after removing it using the random numbers.

I get this new number N composed of my two random numbers ( N = xor(Za, Zb) ), and I need to apply it to my choice of, not which envelope to select first (A|B) - but whether I should switch. The only way to get this number with information in it is to have assigned a random number to both envelopes and used the two bits of negative information to cancel each other - and therefore the only choice with non-zero information is the one which includes both envelopes - hence the decision to switch is the only one that yields the non-zero information outcome.

Again, intuitively it seems related to how multiplying two negative integers yields a positive one, multiplying two entropy (negative information) sources yields negentropy (positive information). That could further imply that the information in the decision to switch is also based on whether the number of envelopes is even or odd, or maybe prime or composite? Sounding this out, it may also be related to graphs where the number of nodes/edges (where edges represent switching decisions and nodes envelopes) either forms a closed path or open one, and in the case of an even number of envelopes you get the max information by swtiching n-times, and in the case of odd ones, you switch n+1 times.

Short version is, the choice with the highest number of bits of information in the two envelope case is to switch because it is predicated on there being more envelopes than the one you have started with.


It is trivial! It is trivial!

The game defines a graph and your path through that graph is going to lead to an outcome. You have a strategy, a choice of how to move through the graph. However, you have no information about where you are in the graph. So when we think about how we move through the graph, we must always think the same way. We can never think differently, because no action changes the information we have about where we are. However, the terminal case is defined thusly. R(STAY) = {50%: 2A, 50% A}. There is never a reward for R(SWITCH). Anyone telling you to always choose switch when that is the only solution that can never have a reward is obviously wrong.

But why? Why were they wrong?

Simple. In imperfect information games things work differently than in perfect information games. In chess when you make a move it is that move. If you make a move it is a different move. The subgames are independent of each other. In poker, it isn't so! If you make a bet in Poker, you don't know what subgame you are in. You have to play as if you're in multiple at the same time.

Now look at the fast con that seems to be tripping people up. They are pretending we have perfect information. 2A, A/2? Phaw. We can't have that, because that supposes we know which subgame we are in, but we can't know which subgame we are in, because we were never given that information. Even if we were - it doesn't matter. We're in an imperfect information game! The places we didn't explore are relevant to our calculation! We can't ignore them! But we did, by insisting on an inequality which didn't hold.

This is probably counterintuitive. Its trivial, but people argue about it. So lets talk about this more concretely.

This is what they pulled on you. They claimed that these two different game trees could be treated as the same game tree:

             -> Keep    -> 2A
          2A -> Switch  -> A
    0.5 ->
          A  -> Keep    -> A
             -> Switch  -> 2A

                -> Keep    -> A
          A     -> Switch  -> A/2
    0.5 ->
          A/2   -> Keep    -> A/2
                -> Switch  -> A
Two different game trees! Two different game trees and they acted like they were the same. But they aren't! They can't be! You can't tell which subgame you are in you have to treat them as equal in imperfect information games! So what is our actual situation with their reasoning? Our actual situation is a contradiction. Since subgames depends on each other when they do this combination they are secretly speaking nonsense. Look closely: They are claiming that 2A=A; that A=A/2. They have to be, because in imperfect information games we have to be in every subgame at once, but they made two different games, but then they combined them. Since the correct formulation has to have every solution treated as the same, but they took something from a different subgame, they declared that two subgames were equal, but they weren't.

It is so trivial. That's why the correct calculation looks like only an expectation on one game tree. That's why it seems to make sense - your brain knows this sort of reasoning can be appropriate with perfect information. I realize a lot of people have argued about this, but frankly they should stop.

And if you're interested in exploring why you have to treat subgames as different in imperfect information, why you have to operate on information sets rather than state spaces in imperfect information, here is a great talk on the NIPS 2017 Best Paper award which happens to touch on the subject:

https://www.youtube.com/watch?v=tRiaGahlyy4

It is about solving poker, which is a much more complicated imperfect information game. But even the simple imperfect information games have this property.


> you are given two identical envelopes [...] containing money. One contains twice as much as the other

> having chosen an envelope at will, but before inspecting it, you are given the chance to switch envelopes. Should you switch?

> because the person stands to gain twice as much money if they switch, while the only risk is halving what they currently have, a case can be made for switching the envelope

I dig this, but it seems like an unavoidable nonsense pattern that congeals in pools of otherwise practically useful logic / language systems.

Like Neo in the Matrix. Remember the Architect? That guy was cool.


> Imagine you are given two identical envelopes, each containing money. One contains twice as much as the other. You may pick one envelope and keep the money it contains. Having chosen an envelope at will, but before inspecting it, you are given the chance to switch envelopes. Should you switch?

Unless the imaginary “giver” shows you what was in the other imaginary envelope you’ll never know. We don’t even know if this imaginary person is telling the truth. How do you know that “One contains twice as much as the other?”

I have never been offered a choice of cash containing envelopes in my life – maybe because I have never worked in a job where bribes could be usefully employed.

The whole “problem” is about as worthwhile a subject of contemplation as, say, “Do flying pigs taste more like partridge than earth-bound pigs?”

Does anyone know of a paradox that works/exists in the real world? I mean something a bit more substantial than whether some random dude claims to be a liar and you supposedly bother to wonder whether or not he’s telling the truth.


It's a thought experiment. It's not about envelopes of cash, it's about logic and game theory. Their usefullness can hardly be overstated.

But for what it's worth, there are a ton of thought experiments inspired by or lifted from the real world. The Monty hall problem is taken from a game show - it even has a contestant who must choose one of three doors...


Why is it a thought experiment? After all envelopes, cash, experimenters etcetera exist. It could be a real experiment. On the other hand what theory or hypothesis would it be testing?

What’s the problem with Monty Hall? The solution (switch to the other door) is demonstrably true. I believe it’s called a veridical paradox.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: